[2023-11-09 22:22:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 663): INFO Full config saved to work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/config.json
[2023-11-09 22:22:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 666): INFO AMP_OPT_LEVEL: O1
AMP_TYPE: float16
AUG:
  AUTO_AUGMENT: rand-m9-mstd0.5-inc1
  COLOR_JITTER: 0.4
  CUTMIX: 1.0
  CUTMIX_MINMAX: null
  MEAN:
  - 0.485
  - 0.456
  - 0.406
  MIXUP: 0.8
  MIXUP_MODE: batch
  MIXUP_PROB: 1.0
  MIXUP_SWITCH_PROB: 0.5
  RANDOM_RESIZED_CROP: false
  RECOUNT: 1
  REMODE: pixel
  REPROB: 0.25
  STD:
  - 0.229
  - 0.224
  - 0.225
BASE:
- ''
DATA:
  BATCH_SIZE: 128
  CACHE_MODE: part
  DATASET: imagenet
  DATA_PATH: /mnt/petrelfs/share/images
  IMG_ON_MEMORY: false
  IMG_SIZE: 224
  INTERPOLATION: bicubic
  NUM_WORKERS: 8
  PIN_MEMORY: true
  TRANSFORM: build_transform_for_linear_probe
  ZIP_MODE: false
EVAL_22K_TO_1K: false
EVAL_FREQ: 1
EVAL_MODE: false
LOCAL_RANK: 0
MODEL:
  DROP_PATH_RATE: 0.0
  DROP_PATH_TYPE: linear
  DROP_RATE: 0.0
  INTERN_IMAGE:
    CENTER_FEATURE_SCALE: false
    CHANNELS: 64
    CORE_OP: DCNv3
    DEPTHS:
    - 4
    - 4
    - 18
    - 4
    DW_KERNEL_SIZE: null
    GROUPS:
    - 4
    - 8
    - 16
    - 32
    LAYER_SCALE: null
    LEVEL2_POST_NORM: false
    LEVEL2_POST_NORM_BLOCK_IDS: null
    MLP_RATIO: 4.0
    OFFSET_SCALE: 1.0
    POST_NORM: false
    REMOVE_CENTER: false
    RES_POST_NORM: false
    USE_CLIP_PROJECTOR: false
  INTERN_VIT_6B:
    CLS_TARGET: cls_patch_concat
    DEPTH: 48
    EMBED_DIM: 3200
    FREEZE_VIT: true
    INIT_VALUES: 0.1
    MLP_RATIO: 4
    NUM_HEADS: 25
    OUT_INDICES:
    - 47
    PATCH_SIZE: 14
    PRETRAINED: ./pretrained/intern_vit_6b_224px.pth
    PRETRAIN_SIZE: 224
    QKV_BIAS: false
    QK_NORMALIZATION: true
    USE_FLASH_ATTN: true
  LABEL_SMOOTHING: 0.1
  NAME: intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
  NUM_CLASSES: 1000
  PRETRAINED: ''
  RESUME: ''
  TYPE: intern_vit_6b
OUTPUT: work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
PRINT_FREQ: 10
SAVE_CKPT_NUM: 1
SAVE_FREQ: 1
SEED: 0
TAG: default
TEST:
  CROP: true
  SEQUENTIAL: false
THROUGHPUT_MODE: false
TRAIN:
  ACCUMULATION_STEPS: 1
  AUTO_RESUME: true
  BASE_LR: 0.2
  CLIP_GRAD: 5.0
  EMA:
    DECAY: 0.998
    ENABLE: true
  EPOCHS: 10
  LR_LAYER_DECAY: false
  LR_LAYER_DECAY_RATIO: 0.875
  LR_SCHEDULER:
    DECAY_EPOCHS: 30
    DECAY_RATE: 0.1
    NAME: cosine
  MIN_LR: 0.0
  OPTIMIZER:
    BETAS:
    - 0.9
    - 0.999
    DCN_LR_MUL: null
    EPS: 1.0e-08
    FREEZE_BACKBONE: null
    MOMENTUM: 0.9
    NAME: sgd
    USE_ZERO: false
  RAND_INIT_FT_HEAD: false
  START_EPOCH: 0
  USE_CHECKPOINT: false
  WARMUP_EPOCHS: 1
  WARMUP_LR: 0.0
  WEIGHT_DECAY: 0.0

[2023-11-09 22:22:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 173): INFO Creating model:intern_vit_6b/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
[2023-11-09 22:24:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 176): INFO InternViT6B(
  (patch_embed): PatchEmbed(
    (proj): Conv2d(3, 3200, kernel_size=(14, 14), stride=(14, 14))
    (norm): Identity()
  )
  (pos_drop): Identity()
  (blocks): ModuleList(
    (0): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (1): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (2): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (3): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (4): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (5): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (6): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (7): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (8): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (9): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (10): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (11): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (12): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (13): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (14): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (15): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (16): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (17): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (18): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (19): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (20): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (21): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (22): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (23): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (24): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (25): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (26): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (27): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (28): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (29): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (30): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (31): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (32): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (33): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (34): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (35): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (36): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (37): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (38): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (39): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (40): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (41): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (42): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (43): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (44): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (45): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (46): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (47): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
  )
  (clip_projector): AttentionPoolingBlock(
    (norm1_q): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (norm1_k): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (norm1_v): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (cross_attn): CrossAttention(
      (q): Linear(in_features=3200, out_features=3200, bias=False)
      (k): Linear(in_features=3200, out_features=3200, bias=False)
      (v): Linear(in_features=3200, out_features=3200, bias=False)
      (attn_drop): Dropout(p=0.0, inplace=False)
      (proj): Linear(in_features=3200, out_features=768, bias=True)
      (proj_drop): Dropout(p=0.0, inplace=False)
    )
    (drop_path): Identity()
  )
  (head): Linear(in_features=6400, out_features=1000, bias=True)
)
[2023-11-09 22:24:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 212): INFO Using native Torch AMP. Training in mixed precision.
[2023-11-09 22:24:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 232): INFO number of params: 6401000
[2023-11-09 22:24:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 266): INFO no checkpoint found in work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1, ignoring auto resume
[2023-11-09 22:24:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 307): INFO Start training
[2023-11-09 22:24:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][0/1251]	eta 3:10:53 lr 0.000000	time 9.1551 (9.1551)	model_time 5.9911 (5.9911)	loss 6.9080 (6.9080)	grad_norm 0.0390 (0.0390/0.0000)	mem 48414MB
[2023-11-09 22:24:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][10/1251]	eta 1:01:44 lr 0.001599	time 2.3801 (2.9854)	model_time 2.3799 (2.6974)	loss 6.9079 (6.9081)	grad_norm 0.0403 (0.0404/0.0008)	mem 48463MB
[2023-11-09 22:25:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][20/1251]	eta 0:55:23 lr 0.003197	time 2.3878 (2.6994)	model_time 2.3875 (2.5484)	loss 6.9083 (6.9080)	grad_norm 0.0404 (0.0402/0.0007)	mem 48463MB
[2023-11-09 22:25:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][30/1251]	eta 0:52:54 lr 0.004796	time 2.3913 (2.5997)	model_time 2.3911 (2.4972)	loss 6.9086 (6.9080)	grad_norm 0.0410 (0.0402/0.0007)	mem 48463MB
[2023-11-09 22:26:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][40/1251]	eta 0:51:26 lr 0.006395	time 2.3897 (2.5483)	model_time 2.3895 (2.4708)	loss 6.9069 (6.9079)	grad_norm 0.0402 (0.0401/0.0007)	mem 48463MB
[2023-11-09 22:26:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][50/1251]	eta 0:50:23 lr 0.007994	time 2.3920 (2.5173)	model_time 2.3918 (2.4549)	loss 6.9076 (6.9079)	grad_norm 0.0389 (0.0401/0.0008)	mem 48463MB
[2023-11-09 22:26:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][60/1251]	eta 0:49:37 lr 0.009592	time 2.3922 (2.4997)	model_time 2.3920 (2.4475)	loss 6.9084 (6.9078)	grad_norm 0.0402 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:27:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][70/1251]	eta 0:48:54 lr 0.011191	time 2.3895 (2.4844)	model_time 2.3892 (2.4395)	loss 6.9079 (6.9078)	grad_norm 0.0399 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:27:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][80/1251]	eta 0:48:15 lr 0.012790	time 2.3893 (2.4729)	model_time 2.3891 (2.4334)	loss 6.9083 (6.9077)	grad_norm 0.0389 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:28:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][90/1251]	eta 0:47:40 lr 0.014388	time 2.3926 (2.4639)	model_time 2.3922 (2.4287)	loss 6.9077 (6.9078)	grad_norm 0.0407 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:28:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][100/1251]	eta 0:47:07 lr 0.015987	time 2.3917 (2.4566)	model_time 2.3915 (2.4249)	loss 6.9088 (6.9078)	grad_norm 0.0402 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:28:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][110/1251]	eta 0:46:36 lr 0.017586	time 2.3942 (2.4507)	model_time 2.3939 (2.4218)	loss 6.9070 (6.9078)	grad_norm 0.0423 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:29:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][120/1251]	eta 0:46:06 lr 0.019185	time 2.3877 (2.4457)	model_time 2.3874 (2.4192)	loss 6.9065 (6.9078)	grad_norm 0.0395 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:29:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][130/1251]	eta 0:45:36 lr 0.020783	time 2.3909 (2.4415)	model_time 2.3906 (2.4170)	loss 6.9109 (6.9078)	grad_norm 0.0406 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:30:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][140/1251]	eta 0:45:08 lr 0.022382	time 2.3920 (2.4380)	model_time 2.3918 (2.4151)	loss 6.9067 (6.9077)	grad_norm 0.0392 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:30:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][150/1251]	eta 0:44:40 lr 0.023981	time 2.3877 (2.4348)	model_time 2.3874 (2.4134)	loss 6.9087 (6.9077)	grad_norm 0.0397 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:30:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][160/1251]	eta 0:44:13 lr 0.025580	time 2.3889 (2.4319)	model_time 2.3886 (2.4119)	loss 6.9075 (6.9076)	grad_norm 0.0405 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:31:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][170/1251]	eta 0:43:46 lr 0.027178	time 2.3939 (2.4295)	model_time 2.3936 (2.4107)	loss 6.9067 (6.9076)	grad_norm 0.0411 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:31:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][180/1251]	eta 0:43:19 lr 0.028777	time 2.3910 (2.4274)	model_time 2.3907 (2.4095)	loss 6.9059 (6.9076)	grad_norm 0.0397 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:32:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][190/1251]	eta 0:42:53 lr 0.030376	time 2.3879 (2.4254)	model_time 2.3875 (2.4085)	loss 6.9063 (6.9076)	grad_norm 0.0400 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:32:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][200/1251]	eta 0:42:27 lr 0.031974	time 2.3899 (2.4237)	model_time 2.3895 (2.4076)	loss 6.9059 (6.9075)	grad_norm 0.0390 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:32:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][210/1251]	eta 0:42:01 lr 0.033573	time 2.3901 (2.4221)	model_time 2.3898 (2.4067)	loss 6.9060 (6.9075)	grad_norm 0.0388 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:33:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][220/1251]	eta 0:41:35 lr 0.035172	time 2.3912 (2.4207)	model_time 2.3908 (2.4060)	loss 6.9061 (6.9075)	grad_norm 0.0394 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:33:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][230/1251]	eta 0:41:10 lr 0.036771	time 2.3929 (2.4194)	model_time 2.3927 (2.4053)	loss 6.9062 (6.9075)	grad_norm 0.0395 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:34:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][240/1251]	eta 0:40:44 lr 0.038369	time 2.3913 (2.4183)	model_time 2.3911 (2.4047)	loss 6.9055 (6.9074)	grad_norm 0.0407 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:34:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][250/1251]	eta 0:40:19 lr 0.039968	time 2.3925 (2.4172)	model_time 2.3922 (2.4042)	loss 6.9057 (6.9074)	grad_norm 0.0391 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:34:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][260/1251]	eta 0:39:54 lr 0.041567	time 2.3876 (2.4162)	model_time 2.3873 (2.4037)	loss 6.9061 (6.9074)	grad_norm 0.0400 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:35:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][270/1251]	eta 0:39:29 lr 0.043165	time 2.3928 (2.4153)	model_time 2.3925 (2.4032)	loss 6.9062 (6.9073)	grad_norm 0.0427 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:35:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][280/1251]	eta 0:39:04 lr 0.044764	time 2.3914 (2.4144)	model_time 2.3911 (2.4027)	loss 6.9050 (6.9073)	grad_norm 0.0406 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:36:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][290/1251]	eta 0:38:39 lr 0.046363	time 2.3892 (2.4136)	model_time 2.3889 (2.4023)	loss 6.9045 (6.9072)	grad_norm 0.0396 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:36:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][300/1251]	eta 0:38:14 lr 0.047962	time 2.3893 (2.4128)	model_time 2.3891 (2.4019)	loss 6.9072 (6.9072)	grad_norm 0.0389 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:36:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][310/1251]	eta 0:37:49 lr 0.049560	time 2.3893 (2.4121)	model_time 2.3890 (2.4015)	loss 6.9086 (6.9072)	grad_norm 0.0396 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:37:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][320/1251]	eta 0:37:24 lr 0.051159	time 2.3962 (2.4114)	model_time 2.3959 (2.4011)	loss 6.9065 (6.9071)	grad_norm 0.0391 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:37:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][330/1251]	eta 0:37:00 lr 0.052758	time 2.3900 (2.4108)	model_time 2.3897 (2.4009)	loss 6.9029 (6.9071)	grad_norm 0.0399 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:38:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][340/1251]	eta 0:36:35 lr 0.054357	time 2.3889 (2.4103)	model_time 2.3887 (2.4006)	loss 6.9063 (6.9070)	grad_norm 0.0396 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:38:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][350/1251]	eta 0:36:11 lr 0.055955	time 2.3928 (2.4098)	model_time 2.3926 (2.4004)	loss 6.9037 (6.9070)	grad_norm 0.0407 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:38:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][360/1251]	eta 0:35:46 lr 0.057554	time 2.3936 (2.4093)	model_time 2.3932 (2.4001)	loss 6.9028 (6.9070)	grad_norm 0.0385 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:39:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][370/1251]	eta 0:35:22 lr 0.059153	time 2.3907 (2.4088)	model_time 2.3904 (2.3999)	loss 6.9026 (6.9069)	grad_norm 0.0396 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:39:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][380/1251]	eta 0:34:58 lr 0.060751	time 2.3909 (2.4088)	model_time 2.3907 (2.4001)	loss 6.9034 (6.9068)	grad_norm 0.0406 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:40:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][390/1251]	eta 0:34:33 lr 0.062350	time 2.3921 (2.4083)	model_time 2.3919 (2.3998)	loss 6.9049 (6.9068)	grad_norm 0.0402 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:40:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][400/1251]	eta 0:34:09 lr 0.063949	time 2.3889 (2.4079)	model_time 2.3886 (2.3997)	loss 6.9070 (6.9068)	grad_norm 0.0406 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:40:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][410/1251]	eta 0:33:44 lr 0.065548	time 2.3914 (2.4075)	model_time 2.3912 (2.3994)	loss 6.9057 (6.9067)	grad_norm 0.0386 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:41:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][420/1251]	eta 0:33:20 lr 0.067146	time 2.3885 (2.4071)	model_time 2.3883 (2.3992)	loss 6.9064 (6.9067)	grad_norm 0.0390 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:41:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][430/1251]	eta 0:32:55 lr 0.068745	time 2.3945 (2.4068)	model_time 2.3942 (2.3990)	loss 6.9063 (6.9066)	grad_norm 0.0392 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:41:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][440/1251]	eta 0:32:31 lr 0.070344	time 2.3894 (2.4064)	model_time 2.3892 (2.3988)	loss 6.9020 (6.9066)	grad_norm 0.0398 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:42:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][450/1251]	eta 0:32:07 lr 0.071942	time 2.3925 (2.4060)	model_time 2.3922 (2.3986)	loss 6.9041 (6.9065)	grad_norm 0.0397 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:42:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][460/1251]	eta 0:31:42 lr 0.073541	time 2.3854 (2.4057)	model_time 2.3852 (2.3985)	loss 6.9042 (6.9065)	grad_norm 0.0395 (0.0399/0.0009)	mem 48463MB
[2023-11-09 22:43:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][470/1251]	eta 0:31:18 lr 0.075140	time 2.3874 (2.4054)	model_time 2.3871 (2.3983)	loss 6.9041 (6.9064)	grad_norm 0.0396 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:43:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][480/1251]	eta 0:30:54 lr 0.076739	time 2.3928 (2.4051)	model_time 2.3925 (2.3981)	loss 6.9035 (6.9063)	grad_norm 0.0395 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:43:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][490/1251]	eta 0:30:30 lr 0.078337	time 2.3923 (2.4048)	model_time 2.3921 (2.3980)	loss 6.9041 (6.9063)	grad_norm 0.0399 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:44:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][500/1251]	eta 0:30:05 lr 0.079936	time 2.3900 (2.4045)	model_time 2.3897 (2.3978)	loss 6.9053 (6.9062)	grad_norm 0.0400 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:44:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][510/1251]	eta 0:29:41 lr 0.081535	time 2.3916 (2.4042)	model_time 2.3914 (2.3977)	loss 6.9032 (6.9062)	grad_norm 0.0414 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:45:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][520/1251]	eta 0:29:17 lr 0.083133	time 2.3923 (2.4040)	model_time 2.3918 (2.3975)	loss 6.9004 (6.9061)	grad_norm 0.0400 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:45:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][530/1251]	eta 0:28:53 lr 0.084732	time 2.3893 (2.4038)	model_time 2.3890 (2.3974)	loss 6.9040 (6.9061)	grad_norm 0.0387 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:45:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][540/1251]	eta 0:28:28 lr 0.086331	time 2.3919 (2.4035)	model_time 2.3916 (2.3973)	loss 6.9032 (6.9060)	grad_norm 0.0394 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:46:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][550/1251]	eta 0:28:04 lr 0.087930	time 2.3926 (2.4033)	model_time 2.3923 (2.3972)	loss 6.9045 (6.9059)	grad_norm 0.0404 (0.0400/0.0008)	mem 48463MB
[2023-11-09 22:46:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][560/1251]	eta 0:27:40 lr 0.089528	time 2.3913 (2.4031)	model_time 2.3909 (2.3971)	loss 6.9032 (6.9059)	grad_norm 0.0374 (0.0400/0.0009)	mem 48463MB
[2023-11-09 22:47:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][570/1251]	eta 0:27:16 lr 0.091127	time 2.3924 (2.4029)	model_time 2.3922 (2.3969)	loss 6.8996 (6.9058)	grad_norm 0.0417 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:47:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][580/1251]	eta 0:26:52 lr 0.092726	time 2.3924 (2.4027)	model_time 2.3921 (2.3968)	loss 6.9033 (6.9057)	grad_norm 0.0412 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:47:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][590/1251]	eta 0:26:28 lr 0.094325	time 2.3929 (2.4024)	model_time 2.3926 (2.3967)	loss 6.9051 (6.9057)	grad_norm 0.0402 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:48:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][600/1251]	eta 0:26:03 lr 0.095923	time 2.3902 (2.4023)	model_time 2.3900 (2.3966)	loss 6.8966 (6.9056)	grad_norm 0.0380 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:48:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][610/1251]	eta 0:25:39 lr 0.097522	time 2.3922 (2.4021)	model_time 2.3919 (2.3965)	loss 6.9010 (6.9055)	grad_norm 0.0396 (0.0399/0.0008)	mem 48463MB
[2023-11-09 22:49:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 663): INFO Full config saved to work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/config.json
[2023-11-09 22:49:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 666): INFO AMP_OPT_LEVEL: O1
AMP_TYPE: float16
AUG:
  AUTO_AUGMENT: rand-m9-mstd0.5-inc1
  COLOR_JITTER: 0.4
  CUTMIX: 1.0
  CUTMIX_MINMAX: null
  MEAN:
  - 0.485
  - 0.456
  - 0.406
  MIXUP: 0.8
  MIXUP_MODE: batch
  MIXUP_PROB: 1.0
  MIXUP_SWITCH_PROB: 0.5
  RANDOM_RESIZED_CROP: false
  RECOUNT: 1
  REMODE: pixel
  REPROB: 0.25
  STD:
  - 0.229
  - 0.224
  - 0.225
BASE:
- ''
DATA:
  BATCH_SIZE: 128
  CACHE_MODE: part
  DATASET: imagenet
  DATA_PATH: /mnt/petrelfs/share/images
  IMG_ON_MEMORY: false
  IMG_SIZE: 224
  INTERPOLATION: bicubic
  NUM_WORKERS: 8
  PIN_MEMORY: true
  TRANSFORM: build_transform_for_linear_probe
  ZIP_MODE: false
EVAL_22K_TO_1K: false
EVAL_FREQ: 1
EVAL_MODE: false
LOCAL_RANK: 0
MODEL:
  DROP_PATH_RATE: 0.0
  DROP_PATH_TYPE: linear
  DROP_RATE: 0.0
  INTERN_IMAGE:
    CENTER_FEATURE_SCALE: false
    CHANNELS: 64
    CORE_OP: DCNv3
    DEPTHS:
    - 4
    - 4
    - 18
    - 4
    DW_KERNEL_SIZE: null
    GROUPS:
    - 4
    - 8
    - 16
    - 32
    LAYER_SCALE: null
    LEVEL2_POST_NORM: false
    LEVEL2_POST_NORM_BLOCK_IDS: null
    MLP_RATIO: 4.0
    OFFSET_SCALE: 1.0
    POST_NORM: false
    REMOVE_CENTER: false
    RES_POST_NORM: false
    USE_CLIP_PROJECTOR: false
  INTERN_VIT_6B:
    CLS_TARGET: cls_patch_concat
    DEPTH: 48
    EMBED_DIM: 3200
    FREEZE_VIT: true
    INIT_VALUES: 0.1
    MLP_RATIO: 4
    NUM_HEADS: 25
    OUT_INDICES:
    - 47
    PATCH_SIZE: 14
    PRETRAINED: ./pretrained/intern_vit_6b_224px.pth
    PRETRAIN_SIZE: 224
    QKV_BIAS: false
    QK_NORMALIZATION: true
    USE_FLASH_ATTN: true
  LABEL_SMOOTHING: 0.1
  NAME: intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
  NUM_CLASSES: 1000
  PRETRAINED: ''
  RESUME: ''
  TYPE: intern_vit_6b
OUTPUT: work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
PRINT_FREQ: 10
SAVE_CKPT_NUM: 1
SAVE_FREQ: 1
SEED: 0
TAG: default
TEST:
  CROP: true
  SEQUENTIAL: false
THROUGHPUT_MODE: false
TRAIN:
  ACCUMULATION_STEPS: 1
  AUTO_RESUME: true
  BASE_LR: 0.2
  CLIP_GRAD: 5.0
  EMA:
    DECAY: 0.998
    ENABLE: true
  EPOCHS: 10
  LR_LAYER_DECAY: false
  LR_LAYER_DECAY_RATIO: 0.875
  LR_SCHEDULER:
    DECAY_EPOCHS: 30
    DECAY_RATE: 0.1
    NAME: cosine
  MIN_LR: 0.0
  OPTIMIZER:
    BETAS:
    - 0.9
    - 0.999
    DCN_LR_MUL: null
    EPS: 1.0e-08
    FREEZE_BACKBONE: null
    MOMENTUM: 0.9
    NAME: sgd
    USE_ZERO: false
  RAND_INIT_FT_HEAD: false
  START_EPOCH: 0
  USE_CHECKPOINT: false
  WARMUP_EPOCHS: 1
  WARMUP_LR: 0.0
  WEIGHT_DECAY: 0.0

[2023-11-09 22:49:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 173): INFO Creating model:intern_vit_6b/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
[2023-11-09 22:50:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 176): INFO InternViT6B(
  (patch_embed): PatchEmbed(
    (proj): Conv2d(3, 3200, kernel_size=(14, 14), stride=(14, 14))
    (norm): Identity()
  )
  (pos_drop): Identity()
  (blocks): ModuleList(
    (0): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (1): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (2): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (3): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (4): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (5): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (6): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (7): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (8): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (9): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (10): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (11): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (12): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (13): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (14): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (15): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (16): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (17): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (18): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (19): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (20): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (21): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (22): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (23): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (24): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (25): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (26): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (27): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (28): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (29): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (30): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (31): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (32): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (33): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (34): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (35): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (36): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (37): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (38): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (39): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (40): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (41): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (42): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (43): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (44): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (45): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (46): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (47): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
  )
  (clip_projector): AttentionPoolingBlock(
    (norm1_q): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (norm1_k): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (norm1_v): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (cross_attn): CrossAttention(
      (q): Linear(in_features=3200, out_features=3200, bias=False)
      (k): Linear(in_features=3200, out_features=3200, bias=False)
      (v): Linear(in_features=3200, out_features=3200, bias=False)
      (attn_drop): Dropout(p=0.0, inplace=False)
      (proj): Linear(in_features=3200, out_features=768, bias=True)
      (proj_drop): Dropout(p=0.0, inplace=False)
    )
    (drop_path): Identity()
  )
  (norm): SyncBatchNorm(6400, eps=1e-06, momentum=0.1, affine=True, track_running_stats=True)
  (head): Linear(in_features=6400, out_features=1000, bias=True)
)
[2023-11-09 22:50:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 212): INFO Using native Torch AMP. Training in mixed precision.
[2023-11-09 22:50:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 232): INFO number of params: 6413800
[2023-11-09 22:50:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 266): INFO no checkpoint found in work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1, ignoring auto resume
[2023-11-09 22:50:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 307): INFO Start training
[2023-11-09 22:50:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][0/1251]	eta 3:17:54 lr 0.000000	time 9.4921 (9.4921)	model_time 6.4074 (6.4074)	loss 7.2200 (7.2200)	grad_norm 2.3526 (2.3526/0.0000)	mem 48415MB
[2023-11-09 22:51:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][10/1251]	eta 1:02:32 lr 0.001599	time 2.3903 (3.0238)	model_time 2.3901 (2.7431)	loss 7.2357 (7.1931)	grad_norm 2.3168 (2.3043/0.0489)	mem 48464MB
[2023-11-09 22:51:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][20/1251]	eta 0:55:52 lr 0.003197	time 2.3962 (2.7232)	model_time 2.3959 (2.5760)	loss 7.0314 (7.1526)	grad_norm 2.2741 (2.3010/0.0468)	mem 48464MB
[2023-11-09 22:52:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][30/1251]	eta 0:53:15 lr 0.004796	time 2.3949 (2.6172)	model_time 2.3947 (2.5174)	loss 6.4926 (7.0576)	grad_norm 2.4111 (2.3011/0.0497)	mem 48464MB
[2023-11-09 22:52:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][40/1251]	eta 0:51:43 lr 0.006395	time 2.3929 (2.5626)	model_time 2.3927 (2.4870)	loss 6.1765 (6.8862)	grad_norm 2.3002 (2.2970/0.0503)	mem 48464MB
[2023-11-09 22:52:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][50/1251]	eta 0:50:38 lr 0.007994	time 2.3965 (2.5297)	model_time 2.3963 (2.4688)	loss 5.8360 (6.6860)	grad_norm 2.1085 (2.2785/0.0613)	mem 48464MB
[2023-11-09 22:53:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][60/1251]	eta 0:49:50 lr 0.009592	time 2.3921 (2.5111)	model_time 2.3917 (2.4601)	loss 4.9965 (6.4310)	grad_norm 2.1474 (2.2510/0.0850)	mem 48464MB
[2023-11-09 22:53:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][70/1251]	eta 0:49:06 lr 0.011191	time 2.3945 (2.4947)	model_time 2.3943 (2.4509)	loss 4.7077 (6.1750)	grad_norm 1.9435 (2.2120/0.1254)	mem 48464MB
[2023-11-09 22:54:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][80/1251]	eta 0:48:26 lr 0.012790	time 2.3915 (2.4823)	model_time 2.3913 (2.4438)	loss 3.6265 (5.9485)	grad_norm 1.7240 (2.1623/0.1781)	mem 48464MB
[2023-11-09 22:54:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][90/1251]	eta 0:47:50 lr 0.014388	time 2.3976 (2.4726)	model_time 2.3973 (2.4383)	loss 4.7333 (5.7365)	grad_norm 1.5958 (2.1106/0.2242)	mem 48464MB
[2023-11-09 22:54:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][100/1251]	eta 0:47:16 lr 0.015987	time 2.3901 (2.4647)	model_time 2.3899 (2.4338)	loss 4.2756 (5.5739)	grad_norm 1.6641 (2.0617/0.2595)	mem 48464MB
[2023-11-09 22:55:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][110/1251]	eta 0:46:44 lr 0.017586	time 2.3931 (2.4583)	model_time 2.3929 (2.4301)	loss 2.9538 (5.4218)	grad_norm 1.5042 (2.0160/0.2874)	mem 48464MB
[2023-11-09 22:55:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][120/1251]	eta 0:46:14 lr 0.019185	time 2.3920 (2.4529)	model_time 2.3918 (2.4270)	loss 2.2351 (5.2853)	grad_norm 1.5735 (1.9763/0.3057)	mem 48464MB
[2023-11-09 22:56:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][130/1251]	eta 0:45:44 lr 0.020783	time 2.3908 (2.4484)	model_time 2.3906 (2.4245)	loss 4.2635 (5.1561)	grad_norm 1.5457 (1.9401/0.3199)	mem 48464MB
[2023-11-09 22:56:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][140/1251]	eta 0:45:15 lr 0.022382	time 2.3956 (2.4446)	model_time 2.3954 (2.4223)	loss 3.6315 (5.0316)	grad_norm 1.4352 (1.9090/0.3288)	mem 48464MB
[2023-11-09 22:56:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][150/1251]	eta 0:44:47 lr 0.023981	time 2.3925 (2.4412)	model_time 2.3922 (2.4204)	loss 2.6320 (4.9259)	grad_norm 1.4959 (1.8809/0.3351)	mem 48464MB
[2023-11-09 22:57:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][160/1251]	eta 0:44:20 lr 0.025580	time 2.3943 (2.4381)	model_time 2.3941 (2.4186)	loss 3.9721 (4.8499)	grad_norm 1.5352 (1.8565/0.3385)	mem 48464MB
[2023-11-09 22:57:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][170/1251]	eta 0:43:52 lr 0.027178	time 2.3943 (2.4355)	model_time 2.3941 (2.4171)	loss 3.4090 (4.7667)	grad_norm 1.4050 (1.8335/0.3418)	mem 48464MB
[2023-11-09 22:58:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][180/1251]	eta 0:43:25 lr 0.028777	time 2.3936 (2.4332)	model_time 2.3934 (2.4158)	loss 3.6979 (4.6768)	grad_norm 1.5120 (1.8109/0.3455)	mem 48464MB
[2023-11-09 22:58:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][190/1251]	eta 0:42:59 lr 0.030376	time 2.3914 (2.4311)	model_time 2.3912 (2.4146)	loss 2.9698 (4.6107)	grad_norm 1.3750 (1.7923/0.3464)	mem 48464MB
[2023-11-09 22:58:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][200/1251]	eta 0:42:33 lr 0.031974	time 2.3911 (2.4292)	model_time 2.3909 (2.4135)	loss 3.3424 (4.5403)	grad_norm 1.4216 (1.7735/0.3479)	mem 48464MB
[2023-11-09 22:59:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][210/1251]	eta 0:42:07 lr 0.033573	time 2.3943 (2.4276)	model_time 2.3941 (2.4126)	loss 3.5985 (4.4838)	grad_norm 1.3453 (1.7563/0.3484)	mem 48464MB
[2023-11-09 22:59:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][220/1251]	eta 0:41:41 lr 0.035172	time 2.3971 (2.4261)	model_time 2.3968 (2.4118)	loss 3.9187 (4.4301)	grad_norm 1.3655 (1.7426/0.3467)	mem 48464MB
[2023-11-09 23:00:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][230/1251]	eta 0:41:15 lr 0.036771	time 2.3912 (2.4247)	model_time 2.3910 (2.4110)	loss 1.9777 (4.3835)	grad_norm 1.4077 (1.7304/0.3442)	mem 48464MB
[2023-11-09 23:00:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][240/1251]	eta 0:40:50 lr 0.038369	time 2.3915 (2.4234)	model_time 2.3913 (2.4103)	loss 3.6814 (4.3441)	grad_norm 1.3453 (1.7185/0.3425)	mem 48464MB
[2023-11-09 23:00:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][250/1251]	eta 0:40:24 lr 0.039968	time 2.3945 (2.4222)	model_time 2.3943 (2.4096)	loss 3.6980 (4.2927)	grad_norm 1.4719 (1.7063/0.3410)	mem 48464MB
[2023-11-09 23:01:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][260/1251]	eta 0:39:59 lr 0.041567	time 2.3903 (2.4212)	model_time 2.3900 (2.4090)	loss 3.5196 (4.2461)	grad_norm 1.4799 (1.6960/0.3386)	mem 48464MB
[2023-11-09 23:01:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][270/1251]	eta 0:39:34 lr 0.043165	time 2.3948 (2.4202)	model_time 2.3946 (2.4084)	loss 4.0628 (4.2070)	grad_norm 1.3421 (1.6854/0.3369)	mem 48464MB
[2023-11-09 23:02:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][280/1251]	eta 0:39:09 lr 0.044764	time 2.3948 (2.4192)	model_time 2.3945 (2.4078)	loss 3.2744 (4.1794)	grad_norm 1.5095 (1.6766/0.3342)	mem 48464MB
[2023-11-09 23:02:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][290/1251]	eta 0:38:44 lr 0.046363	time 2.3919 (2.4184)	model_time 2.3915 (2.4073)	loss 1.8651 (4.1432)	grad_norm 1.4163 (1.6683/0.3318)	mem 48464MB
[2023-11-09 23:02:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][300/1251]	eta 0:38:19 lr 0.047962	time 2.3919 (2.4176)	model_time 2.3916 (2.4069)	loss 3.0746 (4.1036)	grad_norm 1.4907 (1.6582/0.3272)	mem 48464MB
[2023-11-09 23:03:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][310/1251]	eta 0:37:54 lr 0.049560	time 2.3929 (2.4168)	model_time 2.3927 (2.4065)	loss 3.7970 (4.0770)	grad_norm 1.4474 (1.6323/0.3057)	mem 48464MB
[2023-11-09 23:03:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][320/1251]	eta 0:37:29 lr 0.051159	time 2.3997 (2.4161)	model_time 2.3995 (2.4061)	loss 4.3810 (4.0667)	grad_norm 1.4694 (1.6034/0.2815)	mem 48464MB
[2023-11-09 23:04:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][330/1251]	eta 0:37:04 lr 0.052758	time 2.3953 (2.4155)	model_time 2.3950 (2.4058)	loss 2.0437 (4.0428)	grad_norm 1.3703 (1.5747/0.2512)	mem 48464MB
[2023-11-09 23:04:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][340/1251]	eta 0:36:39 lr 0.054357	time 2.3965 (2.4149)	model_time 2.3963 (2.4054)	loss 3.6978 (4.0241)	grad_norm 1.3745 (1.5470/0.2148)	mem 48464MB
[2023-11-09 23:04:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][350/1251]	eta 0:36:15 lr 0.055955	time 2.3940 (2.4142)	model_time 2.3937 (2.4050)	loss 3.8836 (4.0028)	grad_norm 1.5197 (1.5221/0.1784)	mem 48464MB
[2023-11-09 23:05:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][360/1251]	eta 0:35:50 lr 0.057554	time 2.3950 (2.4137)	model_time 2.3948 (2.4047)	loss 2.7814 (3.9845)	grad_norm 1.4293 (1.5005/0.1415)	mem 48464MB
[2023-11-09 23:05:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][370/1251]	eta 0:35:25 lr 0.059153	time 2.3973 (2.4131)	model_time 2.3972 (2.4044)	loss 2.3149 (3.9623)	grad_norm 1.3160 (1.4824/0.1122)	mem 48464MB
[2023-11-09 23:06:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][380/1251]	eta 0:35:01 lr 0.060751	time 2.3900 (2.4126)	model_time 2.3898 (2.4041)	loss 1.9566 (3.9425)	grad_norm 1.3874 (1.4694/0.0951)	mem 48464MB
[2023-11-09 23:06:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][390/1251]	eta 0:34:36 lr 0.062350	time 2.3964 (2.4121)	model_time 2.3962 (2.4038)	loss 4.3048 (3.9219)	grad_norm 1.3678 (1.4606/0.0867)	mem 48464MB
[2023-11-09 23:06:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][400/1251]	eta 0:34:12 lr 0.063949	time 2.3928 (2.4116)	model_time 2.3926 (2.4035)	loss 3.7303 (3.9125)	grad_norm 1.6029 (1.4550/0.0828)	mem 48464MB
[2023-11-09 23:07:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][410/1251]	eta 0:33:47 lr 0.065548	time 2.3952 (2.4112)	model_time 2.3950 (2.4033)	loss 3.2414 (3.9034)	grad_norm 1.4472 (1.4519/0.0811)	mem 48464MB
[2023-11-09 23:07:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][420/1251]	eta 0:33:23 lr 0.067146	time 2.3923 (2.4108)	model_time 2.3920 (2.4030)	loss 4.2687 (3.8867)	grad_norm 1.5390 (1.4485/0.0803)	mem 48464MB
[2023-11-09 23:08:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][430/1251]	eta 0:32:58 lr 0.068745	time 2.3930 (2.4104)	model_time 2.3928 (2.4028)	loss 3.7570 (3.8793)	grad_norm 1.4743 (1.4472/0.0811)	mem 48464MB
[2023-11-09 23:08:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][440/1251]	eta 0:32:34 lr 0.070344	time 2.3939 (2.4100)	model_time 2.3937 (2.4026)	loss 4.1466 (3.8712)	grad_norm 1.3744 (1.4460/0.0805)	mem 48464MB
[2023-11-09 23:08:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][450/1251]	eta 0:32:10 lr 0.071942	time 2.3944 (2.4097)	model_time 2.3942 (2.4024)	loss 3.1706 (3.8568)	grad_norm 1.4301 (1.4439/0.0809)	mem 48464MB
[2023-11-09 23:09:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][460/1251]	eta 0:31:45 lr 0.073541	time 2.3941 (2.4094)	model_time 2.3937 (2.4022)	loss 3.4454 (3.8371)	grad_norm 1.4868 (1.4413/0.0806)	mem 48464MB
[2023-11-09 23:09:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][470/1251]	eta 0:31:21 lr 0.075140	time 2.3888 (2.4090)	model_time 2.3885 (2.4020)	loss 3.3977 (3.8161)	grad_norm 1.4823 (1.4397/0.0798)	mem 48464MB
[2023-11-09 23:10:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][480/1251]	eta 0:30:57 lr 0.076739	time 2.3922 (2.4087)	model_time 2.3918 (2.4018)	loss 3.8245 (3.8036)	grad_norm 1.4049 (1.4398/0.0805)	mem 48464MB
[2023-11-09 23:10:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][490/1251]	eta 0:30:32 lr 0.078337	time 2.3935 (2.4084)	model_time 2.3931 (2.4017)	loss 3.2544 (3.7990)	grad_norm 1.5426 (1.4405/0.0795)	mem 48464MB
[2023-11-09 23:10:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][500/1251]	eta 0:30:08 lr 0.079936	time 2.3930 (2.4081)	model_time 2.3927 (2.4015)	loss 3.6975 (3.7863)	grad_norm 1.3675 (1.4413/0.0795)	mem 48464MB
[2023-11-09 23:11:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][510/1251]	eta 0:29:44 lr 0.081535	time 2.3940 (2.4078)	model_time 2.3936 (2.4014)	loss 4.4658 (3.7815)	grad_norm 1.4879 (1.4424/0.0811)	mem 48464MB
[2023-11-09 23:11:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][520/1251]	eta 0:29:19 lr 0.083133	time 2.3941 (2.4076)	model_time 2.3938 (2.4012)	loss 3.2094 (3.7747)	grad_norm 1.4096 (1.4413/0.0819)	mem 48464MB
[2023-11-09 23:12:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][530/1251]	eta 0:28:55 lr 0.084732	time 2.3958 (2.4073)	model_time 2.3956 (2.4011)	loss 3.4954 (3.7597)	grad_norm 1.5384 (1.4409/0.0817)	mem 48464MB
[2023-11-09 23:12:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][540/1251]	eta 0:28:31 lr 0.086331	time 2.3946 (2.4071)	model_time 2.3943 (2.4009)	loss 3.4831 (3.7514)	grad_norm 1.4517 (1.4400/0.0802)	mem 48464MB
[2023-11-09 23:12:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][550/1251]	eta 0:28:07 lr 0.087930	time 2.3978 (2.4069)	model_time 2.3976 (2.4008)	loss 4.2883 (3.7435)	grad_norm 1.3769 (1.4404/0.0812)	mem 48464MB
[2023-11-09 23:13:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][560/1251]	eta 0:27:43 lr 0.089528	time 2.3936 (2.4067)	model_time 2.3934 (2.4007)	loss 3.1646 (3.7403)	grad_norm 1.3448 (1.4412/0.0835)	mem 48464MB
[2023-11-09 23:13:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][570/1251]	eta 0:27:18 lr 0.091127	time 2.3912 (2.4065)	model_time 2.3908 (2.4006)	loss 3.3961 (3.7344)	grad_norm 1.4042 (1.4432/0.0840)	mem 48464MB
[2023-11-09 23:14:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][580/1251]	eta 0:26:54 lr 0.092726	time 2.3941 (2.4063)	model_time 2.3938 (2.4005)	loss 4.0971 (3.7298)	grad_norm 1.4044 (1.4427/0.0840)	mem 48464MB
[2023-11-09 23:14:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][590/1251]	eta 0:26:30 lr 0.094325	time 2.3916 (2.4061)	model_time 2.3913 (2.4004)	loss 3.5507 (3.7242)	grad_norm 1.5176 (1.4418/0.0838)	mem 48464MB
[2023-11-09 23:14:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][600/1251]	eta 0:26:06 lr 0.095923	time 2.3934 (2.4059)	model_time 2.3932 (2.4003)	loss 1.7957 (3.7151)	grad_norm 1.4000 (1.4404/0.0842)	mem 48464MB
[2023-11-09 23:15:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][610/1251]	eta 0:25:42 lr 0.097522	time 2.3975 (2.4060)	model_time 2.3973 (2.4005)	loss 3.0410 (3.7084)	grad_norm 1.4147 (1.4379/0.0820)	mem 48464MB
[2023-11-09 23:15:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][620/1251]	eta 0:25:18 lr 0.099121	time 2.3937 (2.4058)	model_time 2.3935 (2.4004)	loss 4.2440 (3.7054)	grad_norm 1.3875 (1.4383/0.0827)	mem 48464MB
[2023-11-09 23:16:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][630/1251]	eta 0:24:53 lr 0.100719	time 2.3929 (2.4056)	model_time 2.3927 (2.4002)	loss 3.7376 (3.7012)	grad_norm 1.5354 (1.4391/0.0835)	mem 48464MB
[2023-11-09 23:16:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][640/1251]	eta 0:24:29 lr 0.102318	time 2.3930 (2.4054)	model_time 2.3928 (2.4001)	loss 2.2937 (3.6927)	grad_norm 1.3605 (1.4384/0.0838)	mem 48464MB
[2023-11-09 23:16:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][650/1251]	eta 0:24:05 lr 0.103917	time 2.3942 (2.4052)	model_time 2.3939 (2.4000)	loss 4.7331 (3.6877)	grad_norm 1.8375 (1.4387/0.0856)	mem 48464MB
[2023-11-09 23:17:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][660/1251]	eta 0:23:41 lr 0.105516	time 2.3941 (2.4051)	model_time 2.3939 (2.3999)	loss 2.0212 (3.6756)	grad_norm 1.5023 (1.4371/0.0855)	mem 48464MB
[2023-11-09 23:17:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][670/1251]	eta 0:23:17 lr 0.107114	time 2.3951 (2.4049)	model_time 2.3948 (2.3999)	loss 4.7244 (3.6762)	grad_norm 1.6434 (1.4382/0.0861)	mem 48464MB
[2023-11-09 23:18:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][680/1251]	eta 0:22:53 lr 0.108713	time 2.3997 (2.4048)	model_time 2.3994 (2.3998)	loss 4.0787 (3.6721)	grad_norm 1.6015 (1.4385/0.0865)	mem 48464MB
[2023-11-09 23:18:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][690/1251]	eta 0:22:28 lr 0.110312	time 2.3981 (2.4046)	model_time 2.3979 (2.3997)	loss 2.3769 (3.6637)	grad_norm 1.4323 (1.4392/0.0861)	mem 48464MB
[2023-11-09 23:18:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][700/1251]	eta 0:22:04 lr 0.111910	time 2.3950 (2.4045)	model_time 2.3947 (2.3996)	loss 3.6395 (3.6561)	grad_norm 1.4781 (1.4394/0.0849)	mem 48464MB
[2023-11-09 23:19:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][710/1251]	eta 0:21:40 lr 0.113509	time 2.3945 (2.4043)	model_time 2.3943 (2.3996)	loss 1.8167 (3.6501)	grad_norm 1.3902 (1.4389/0.0848)	mem 48464MB
[2023-11-09 23:19:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][720/1251]	eta 0:21:16 lr 0.115108	time 2.3910 (2.4042)	model_time 2.3908 (2.3995)	loss 4.2818 (3.6435)	grad_norm 1.4191 (1.4388/0.0844)	mem 48464MB
[2023-11-09 23:20:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][730/1251]	eta 0:20:52 lr 0.116707	time 2.3953 (2.4040)	model_time 2.3950 (2.3994)	loss 3.7453 (3.6407)	grad_norm 1.3195 (1.4378/0.0838)	mem 48464MB
[2023-11-09 23:20:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][740/1251]	eta 0:20:28 lr 0.118305	time 2.3946 (2.4039)	model_time 2.3943 (2.3993)	loss 3.4952 (3.6413)	grad_norm 1.5039 (1.4373/0.0831)	mem 48464MB
[2023-11-09 23:20:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][750/1251]	eta 0:20:04 lr 0.119904	time 2.3962 (2.4037)	model_time 2.3960 (2.3992)	loss 3.7774 (3.6355)	grad_norm 1.4827 (1.4394/0.0833)	mem 48464MB
[2023-11-09 23:21:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][760/1251]	eta 0:19:40 lr 0.121503	time 2.3923 (2.4036)	model_time 2.3921 (2.3991)	loss 3.8312 (3.6325)	grad_norm 1.5481 (1.4410/0.0832)	mem 48464MB
[2023-11-09 23:21:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][770/1251]	eta 0:19:16 lr 0.123102	time 2.3956 (2.4035)	model_time 2.3953 (2.3991)	loss 4.0607 (3.6305)	grad_norm 1.4791 (1.4403/0.0840)	mem 48464MB
[2023-11-09 23:22:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][780/1251]	eta 0:18:51 lr 0.124700	time 2.3993 (2.4034)	model_time 2.3991 (2.3990)	loss 4.2360 (3.6273)	grad_norm 1.4791 (1.4424/0.0841)	mem 48464MB
[2023-11-09 23:22:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][790/1251]	eta 0:18:27 lr 0.126299	time 2.3932 (2.4033)	model_time 2.3928 (2.3989)	loss 3.1275 (3.6227)	grad_norm 1.3476 (1.4398/0.0830)	mem 48464MB
[2023-11-09 23:22:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][800/1251]	eta 0:18:03 lr 0.127898	time 2.3928 (2.4032)	model_time 2.3924 (2.3989)	loss 4.0833 (3.6202)	grad_norm 1.4189 (1.4410/0.0833)	mem 48464MB
[2023-11-09 23:23:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][810/1251]	eta 0:17:39 lr 0.129496	time 2.3942 (2.4031)	model_time 2.3940 (2.3988)	loss 2.5139 (3.6104)	grad_norm 1.4345 (1.4403/0.0816)	mem 48464MB
[2023-11-09 23:23:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][820/1251]	eta 0:17:15 lr 0.131095	time 2.3957 (2.4030)	model_time 2.3955 (2.3988)	loss 3.9243 (3.6091)	grad_norm 1.4761 (1.4410/0.0812)	mem 48464MB
[2023-11-09 23:24:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][830/1251]	eta 0:16:51 lr 0.132694	time 2.3915 (2.4031)	model_time 2.3913 (2.3989)	loss 2.6450 (3.6018)	grad_norm 1.3366 (1.4386/0.0820)	mem 48464MB
[2023-11-09 23:24:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][840/1251]	eta 0:16:27 lr 0.134293	time 2.3989 (2.4030)	model_time 2.3986 (2.3989)	loss 3.7931 (3.5994)	grad_norm 1.5014 (1.4397/0.0830)	mem 48464MB
[2023-11-09 23:24:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][850/1251]	eta 0:16:03 lr 0.135891	time 2.3982 (2.4029)	model_time 2.3980 (2.3988)	loss 4.2099 (3.5972)	grad_norm 1.4311 (1.4405/0.0829)	mem 48464MB
[2023-11-09 23:25:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][860/1251]	eta 0:15:39 lr 0.137490	time 2.3920 (2.4028)	model_time 2.3917 (2.3988)	loss 3.7189 (3.5962)	grad_norm 1.4640 (1.4388/0.0809)	mem 48464MB
[2023-11-09 23:25:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][870/1251]	eta 0:15:15 lr 0.139089	time 2.3964 (2.4027)	model_time 2.3962 (2.3987)	loss 3.2840 (3.5943)	grad_norm 1.4766 (1.4387/0.0799)	mem 48464MB
[2023-11-09 23:26:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][880/1251]	eta 0:14:51 lr 0.140687	time 2.3907 (2.4026)	model_time 2.3904 (2.3987)	loss 4.1345 (3.5897)	grad_norm 1.5141 (1.4385/0.0799)	mem 48464MB
[2023-11-09 23:26:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][890/1251]	eta 0:14:27 lr 0.142286	time 2.3908 (2.4025)	model_time 2.3906 (2.3986)	loss 2.5757 (3.5859)	grad_norm 1.6022 (1.4409/0.0804)	mem 48464MB
[2023-11-09 23:26:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][900/1251]	eta 0:14:03 lr 0.143885	time 2.3963 (2.4024)	model_time 2.3960 (2.3985)	loss 3.5509 (3.5845)	grad_norm 1.5672 (1.4432/0.0805)	mem 48464MB
[2023-11-09 23:27:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][910/1251]	eta 0:13:39 lr 0.145484	time 2.3937 (2.4023)	model_time 2.3935 (2.3985)	loss 2.9068 (3.5830)	grad_norm 1.4776 (1.4413/0.0814)	mem 48464MB
[2023-11-09 23:27:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][920/1251]	eta 0:13:15 lr 0.147082	time 2.3949 (2.4022)	model_time 2.3947 (2.3985)	loss 4.3440 (3.5838)	grad_norm 1.5263 (1.4424/0.0820)	mem 48464MB
[2023-11-09 23:28:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][930/1251]	eta 0:12:51 lr 0.148681	time 2.3954 (2.4021)	model_time 2.3952 (2.3984)	loss 3.7094 (3.5861)	grad_norm 1.3061 (1.4410/0.0816)	mem 48464MB
[2023-11-09 23:28:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][940/1251]	eta 0:12:27 lr 0.150280	time 2.3971 (2.4021)	model_time 2.3968 (2.3984)	loss 3.8030 (3.5828)	grad_norm 1.3766 (1.4400/0.0810)	mem 48464MB
[2023-11-09 23:28:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][950/1251]	eta 0:12:02 lr 0.151878	time 2.3924 (2.4020)	model_time 2.3921 (2.3983)	loss 2.5608 (3.5748)	grad_norm 1.4197 (1.4381/0.0786)	mem 48464MB
[2023-11-09 23:29:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][960/1251]	eta 0:11:38 lr 0.153477	time 2.3955 (2.4019)	model_time 2.3952 (2.3983)	loss 3.9632 (3.5748)	grad_norm 1.4917 (1.4385/0.0788)	mem 48464MB
[2023-11-09 23:29:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][970/1251]	eta 0:11:14 lr 0.155076	time 2.3923 (2.4019)	model_time 2.3920 (2.3983)	loss 3.1350 (3.5728)	grad_norm 1.4282 (1.4368/0.0773)	mem 48464MB
[2023-11-09 23:30:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][980/1251]	eta 0:10:50 lr 0.156675	time 2.3938 (2.4018)	model_time 2.3935 (2.3982)	loss 3.7047 (3.5721)	grad_norm 1.2823 (1.4356/0.0774)	mem 48464MB
[2023-11-09 23:30:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][990/1251]	eta 0:10:26 lr 0.158273	time 2.3913 (2.4017)	model_time 2.3910 (2.3982)	loss 3.4212 (3.5712)	grad_norm 1.2990 (1.4349/0.0788)	mem 48464MB
[2023-11-09 23:30:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1000/1251]	eta 0:10:02 lr 0.159872	time 2.3927 (2.4016)	model_time 2.3925 (2.3981)	loss 3.1293 (3.5689)	grad_norm 1.4071 (1.4325/0.0793)	mem 48464MB
[2023-11-09 23:31:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1010/1251]	eta 0:09:38 lr 0.161471	time 2.3915 (2.4015)	model_time 2.3913 (2.3980)	loss 3.6185 (3.5700)	grad_norm 1.4679 (1.4316/0.0797)	mem 48464MB
[2023-11-09 23:31:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1020/1251]	eta 0:09:14 lr 0.163070	time 2.3953 (2.4014)	model_time 2.3951 (2.3980)	loss 4.2821 (3.5686)	grad_norm 1.3817 (1.4302/0.0800)	mem 48464MB
[2023-11-09 23:32:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1030/1251]	eta 0:08:50 lr 0.164668	time 2.3921 (2.4014)	model_time 2.3919 (2.3980)	loss 3.5010 (3.5627)	grad_norm 1.4129 (1.4293/0.0797)	mem 48464MB
[2023-11-09 23:32:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1040/1251]	eta 0:08:26 lr 0.166267	time 2.3957 (2.4013)	model_time 2.3955 (2.3979)	loss 4.2256 (3.5646)	grad_norm 1.3426 (1.4264/0.0814)	mem 48464MB
[2023-11-09 23:32:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1050/1251]	eta 0:08:02 lr 0.167866	time 2.3929 (2.4013)	model_time 2.3927 (2.3979)	loss 2.4629 (3.5635)	grad_norm 1.3696 (1.4245/0.0813)	mem 48464MB
[2023-11-09 23:33:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1060/1251]	eta 0:07:38 lr 0.169464	time 2.3928 (2.4012)	model_time 2.3926 (2.3979)	loss 2.5199 (3.5636)	grad_norm 1.4536 (1.4228/0.0814)	mem 48464MB
[2023-11-09 23:33:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1070/1251]	eta 0:07:14 lr 0.171063	time 2.3953 (2.4011)	model_time 2.3949 (2.3978)	loss 3.9027 (3.5621)	grad_norm 1.4341 (1.4216/0.0819)	mem 48464MB
[2023-11-09 23:34:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1080/1251]	eta 0:06:50 lr 0.172662	time 2.3921 (2.4011)	model_time 2.3919 (2.3978)	loss 2.3170 (3.5595)	grad_norm 1.3876 (1.4168/0.0814)	mem 48464MB
[2023-11-09 23:34:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1090/1251]	eta 0:06:26 lr 0.174261	time 2.3890 (2.4010)	model_time 2.3888 (2.3977)	loss 3.3832 (3.5575)	grad_norm 1.3557 (1.4159/0.0823)	mem 48464MB
[2023-11-09 23:34:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1100/1251]	eta 0:06:02 lr 0.175859	time 2.3953 (2.4009)	model_time 2.3950 (2.3977)	loss 3.8735 (3.5564)	grad_norm 1.4431 (1.4124/0.0822)	mem 48464MB
[2023-11-09 23:35:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1110/1251]	eta 0:05:38 lr 0.177458	time 2.3916 (2.4009)	model_time 2.3914 (2.3977)	loss 4.0899 (3.5562)	grad_norm 1.3568 (1.4110/0.0833)	mem 48464MB
[2023-11-09 23:35:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1120/1251]	eta 0:05:14 lr 0.179057	time 2.3931 (2.4008)	model_time 2.3928 (2.3976)	loss 2.3227 (3.5566)	grad_norm 1.3128 (1.4103/0.0827)	mem 48464MB
[2023-11-09 23:36:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1130/1251]	eta 0:04:50 lr 0.180655	time 2.3952 (2.4007)	model_time 2.3950 (2.3976)	loss 4.3401 (3.5575)	grad_norm 1.5388 (1.4100/0.0840)	mem 48464MB
[2023-11-09 23:36:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1140/1251]	eta 0:04:26 lr 0.182254	time 2.3932 (2.4007)	model_time 2.3929 (2.3976)	loss 2.5243 (3.5541)	grad_norm 1.2125 (1.4065/0.0845)	mem 48464MB
[2023-11-09 23:36:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1150/1251]	eta 0:04:02 lr 0.183853	time 2.3907 (2.4006)	model_time 2.3905 (2.3975)	loss 3.8274 (3.5536)	grad_norm 1.4726 (1.4025/0.0846)	mem 48464MB
[2023-11-09 23:37:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1160/1251]	eta 0:03:38 lr 0.185452	time 2.3943 (2.4006)	model_time 2.3941 (2.3975)	loss 4.1251 (3.5530)	grad_norm 1.3427 (1.4011/0.0846)	mem 48464MB
[2023-11-09 23:37:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1170/1251]	eta 0:03:14 lr 0.187050	time 2.3932 (2.4005)	model_time 2.3929 (2.3974)	loss 3.3537 (3.5527)	grad_norm 1.3489 (1.3959/0.0854)	mem 48464MB
[2023-11-09 23:38:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1180/1251]	eta 0:02:50 lr 0.188649	time 2.3921 (2.4004)	model_time 2.3918 (2.3974)	loss 3.2365 (3.5548)	grad_norm 1.3225 (1.3929/0.0861)	mem 48464MB
[2023-11-09 23:38:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1190/1251]	eta 0:02:26 lr 0.190248	time 2.3916 (2.4004)	model_time 2.3913 (2.3974)	loss 2.2954 (3.5559)	grad_norm 1.3553 (1.3895/0.0841)	mem 48464MB
[2023-11-09 23:38:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1200/1251]	eta 0:02:02 lr 0.191847	time 2.3944 (2.4003)	model_time 2.3941 (2.3973)	loss 4.1627 (3.5563)	grad_norm 1.4021 (1.3845/0.0837)	mem 48464MB
[2023-11-09 23:39:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1210/1251]	eta 0:01:38 lr 0.193445	time 2.3930 (2.4003)	model_time 2.3928 (2.3973)	loss 2.3836 (3.5547)	grad_norm 1.2913 (1.3818/0.0846)	mem 48464MB
[2023-11-09 23:39:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1220/1251]	eta 0:01:14 lr 0.195044	time 2.3902 (2.4002)	model_time 2.3899 (2.3973)	loss 2.5093 (3.5541)	grad_norm 1.2369 (1.3762/0.0830)	mem 48464MB
[2023-11-09 23:40:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1230/1251]	eta 0:00:50 lr 0.196643	time 2.3937 (2.4002)	model_time 2.3935 (2.3972)	loss 3.8268 (3.5525)	grad_norm 1.3522 (1.3712/0.0841)	mem 48464MB
[2023-11-09 23:40:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1240/1251]	eta 0:00:26 lr 0.198241	time 2.3931 (2.4001)	model_time 2.3929 (2.3972)	loss 4.0939 (3.5512)	grad_norm 1.2641 (1.3679/0.0849)	mem 48464MB
[2023-11-09 23:40:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [0/10][1250/1251]	eta 0:00:02 lr 0.199840	time 2.3927 (2.4001)	model_time 2.3926 (2.3972)	loss 3.9837 (3.5486)	grad_norm 1.2758 (1.3640/0.0845)	mem 48464MB
[2023-11-09 23:40:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 0 training takes 0:50:02
[2023-11-09 23:40:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_0.pth saving......
[2023-11-09 23:42:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_0.pth saved !!!
[2023-11-09 23:42:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 4.171 (4.171)	Loss 1.0752 (1.0752)	Acc@1 78.809 (78.809)	Acc@5 94.043 (94.043)	Mem 48464MB
[2023-11-09 23:43:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.247 (2.412)	Loss 1.1729 (1.0679)	Acc@1 78.125 (79.705)	Acc@5 92.969 (93.919)	Mem 48464MB
[2023-11-09 23:43:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.251 (2.335)	Loss 0.9478 (1.0693)	Acc@1 81.348 (79.646)	Acc@5 95.605 (93.862)	Mem 48464MB
[2023-11-09 23:44:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.249 (2.308)	Loss 1.0879 (1.0728)	Acc@1 78.320 (79.678)	Acc@5 94.141 (93.816)	Mem 48464MB
[2023-11-09 23:44:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.251 (2.294)	Loss 1.1484 (1.0729)	Acc@1 79.102 (79.545)	Acc@5 92.773 (93.788)	Mem 48464MB
[2023-11-09 23:44:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:0] * Acc@1 79.656 Acc@5 93.828
[2023-11-09 23:44:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 79.7%
[2023-11-09 23:44:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-09 23:46:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-09 23:46:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 79.66%
[2023-11-09 23:46:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 4.364 (4.364)	Loss 0.6348 (0.6348)	Acc@1 86.914 (86.914)	Acc@5 98.633 (98.633)	Mem 48464MB
[2023-11-09 23:46:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.248 (2.430)	Loss 0.7593 (0.6476)	Acc@1 84.961 (87.189)	Acc@5 96.875 (97.985)	Mem 48464MB
[2023-11-09 23:47:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.252 (2.344)	Loss 0.5518 (0.6445)	Acc@1 88.672 (87.212)	Acc@5 98.730 (98.014)	Mem 48464MB
[2023-11-09 23:47:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.253 (2.314)	Loss 0.6841 (0.6491)	Acc@1 85.059 (87.056)	Acc@5 97.461 (97.965)	Mem 48464MB
[2023-11-09 23:48:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.250 (2.299)	Loss 0.7085 (0.6510)	Acc@1 85.645 (87.021)	Acc@5 97.559 (97.942)	Mem 48464MB
[2023-11-09 23:48:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:0] * Acc@1 87.142 Acc@5 97.950
[2023-11-09 23:48:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 87.1%
[2023-11-09 23:48:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saving......
[2023-11-09 23:50:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saved !!!
[2023-11-09 23:50:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 87.14%
[2023-11-09 23:50:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][0/1251]	eta 1:18:10 lr 0.195106	time 3.7496 (3.7496)	model_time 2.3963 (2.3963)	loss 3.3919 (3.3919)	grad_norm 1.2838 (1.2838/0.0000)	mem 48464MB
[2023-11-09 23:50:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][10/1251]	eta 0:51:51 lr 0.195028	time 2.3911 (2.5072)	model_time 2.3907 (2.3838)	loss 3.5496 (3.4933)	grad_norm 1.3375 (1.3110/0.0540)	mem 48464MB
[2023-11-09 23:51:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][20/1251]	eta 0:50:28 lr 0.194949	time 2.3921 (2.4602)	model_time 2.3920 (2.3949)	loss 2.9969 (3.5241)	grad_norm 1.2052 (1.2736/0.0870)	mem 48464MB
[2023-11-09 23:51:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][30/1251]	eta 0:49:38 lr 0.194870	time 2.3983 (2.4395)	model_time 2.3979 (2.3951)	loss 3.7723 (3.6488)	grad_norm 1.3247 (1.2648/0.0833)	mem 48464MB
[2023-11-09 23:51:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][40/1251]	eta 0:49:01 lr 0.194790	time 2.4039 (2.4288)	model_time 2.4036 (2.3951)	loss 2.5005 (3.5941)	grad_norm 1.1770 (1.2631/0.0777)	mem 48464MB
[2023-11-09 23:52:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][50/1251]	eta 0:48:34 lr 0.194710	time 2.3938 (2.4270)	model_time 2.3936 (2.3998)	loss 2.5872 (3.5752)	grad_norm 1.1819 (1.2581/0.0783)	mem 48464MB
[2023-11-09 23:52:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][60/1251]	eta 0:48:04 lr 0.194629	time 2.3877 (2.4216)	model_time 2.3873 (2.3988)	loss 3.5571 (3.5288)	grad_norm 1.2119 (1.2562/0.0762)	mem 48464MB
[2023-11-09 23:53:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][70/1251]	eta 0:47:35 lr 0.194548	time 2.3980 (2.4176)	model_time 2.3977 (2.3980)	loss 2.7791 (3.4509)	grad_norm 1.1356 (1.2485/0.0767)	mem 48464MB
[2023-11-09 23:53:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][80/1251]	eta 0:47:07 lr 0.194466	time 2.3930 (2.4147)	model_time 2.3927 (2.3975)	loss 2.9793 (3.4635)	grad_norm 1.0767 (1.2429/0.0812)	mem 48464MB
[2023-11-09 23:53:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][90/1251]	eta 0:46:40 lr 0.194383	time 2.4001 (2.4123)	model_time 2.3998 (2.3969)	loss 2.1541 (3.4604)	grad_norm 1.1384 (1.2371/0.0800)	mem 48464MB
[2023-11-09 23:54:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][100/1251]	eta 0:46:14 lr 0.194300	time 2.3943 (2.4106)	model_time 2.3939 (2.3967)	loss 4.1066 (3.4727)	grad_norm 1.1935 (1.2317/0.0803)	mem 48464MB
[2023-11-09 23:54:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][110/1251]	eta 0:45:48 lr 0.194216	time 2.3895 (2.4089)	model_time 2.3893 (2.3962)	loss 3.3745 (3.4717)	grad_norm 1.1123 (1.2292/0.0797)	mem 48464MB
[2023-11-09 23:55:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][120/1251]	eta 0:45:23 lr 0.194131	time 2.3927 (2.4077)	model_time 2.3924 (2.3960)	loss 3.5202 (3.4764)	grad_norm 1.0858 (1.2287/0.0804)	mem 48464MB
[2023-11-09 23:55:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][130/1251]	eta 0:44:57 lr 0.194046	time 2.3945 (2.4066)	model_time 2.3942 (2.3958)	loss 3.1991 (3.4685)	grad_norm 1.2201 (1.2263/0.0793)	mem 48464MB
[2023-11-09 23:55:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][140/1251]	eta 0:44:32 lr 0.193961	time 2.3884 (2.4058)	model_time 2.3882 (2.3955)	loss 2.2860 (3.4623)	grad_norm 1.0704 (1.2215/0.0798)	mem 48464MB
[2023-11-09 23:56:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][150/1251]	eta 0:44:07 lr 0.193874	time 2.3935 (2.4049)	model_time 2.3932 (2.3952)	loss 4.5494 (3.4687)	grad_norm 1.1906 (1.2189/0.0795)	mem 48464MB
[2023-11-09 23:56:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][160/1251]	eta 0:43:42 lr 0.193788	time 2.3941 (2.4041)	model_time 2.3937 (2.3950)	loss 3.1267 (3.4793)	grad_norm 1.1389 (1.2148/0.0809)	mem 48464MB
[2023-11-09 23:57:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][170/1251]	eta 0:43:18 lr 0.193700	time 2.3911 (2.4034)	model_time 2.3909 (2.3948)	loss 3.7502 (3.4638)	grad_norm 1.1107 (1.2093/0.0821)	mem 48464MB
[2023-11-09 23:57:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][180/1251]	eta 0:42:53 lr 0.193612	time 2.3943 (2.4028)	model_time 2.3941 (2.3947)	loss 3.9201 (3.4616)	grad_norm 1.0974 (1.2059/0.0820)	mem 48464MB
[2023-11-09 23:57:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][190/1251]	eta 0:42:29 lr 0.193524	time 2.3951 (2.4033)	model_time 2.3948 (2.3955)	loss 4.2893 (3.4481)	grad_norm 1.1328 (1.2041/0.0832)	mem 48464MB
[2023-11-09 23:58:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][200/1251]	eta 0:42:05 lr 0.193434	time 2.3897 (2.4027)	model_time 2.3894 (2.3953)	loss 3.7146 (3.4520)	grad_norm 1.0485 (1.2023/0.0843)	mem 48464MB
[2023-11-09 23:58:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][210/1251]	eta 0:41:40 lr 0.193345	time 2.3920 (2.4023)	model_time 2.3916 (2.3952)	loss 4.4822 (3.4584)	grad_norm 1.1456 (1.1981/0.0848)	mem 48464MB
[2023-11-09 23:59:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][220/1251]	eta 0:41:16 lr 0.193254	time 2.3955 (2.4019)	model_time 2.3952 (2.3951)	loss 3.7014 (3.4455)	grad_norm 1.1685 (1.1945/0.0856)	mem 48464MB
[2023-11-09 23:59:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][230/1251]	eta 0:40:51 lr 0.193163	time 2.3927 (2.4015)	model_time 2.3925 (2.3950)	loss 2.2215 (3.4200)	grad_norm 0.9920 (1.1898/0.0875)	mem 48464MB
[2023-11-09 23:59:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][240/1251]	eta 0:40:27 lr 0.193072	time 2.3936 (2.4011)	model_time 2.3933 (2.3949)	loss 3.7735 (3.3948)	grad_norm 1.0734 (1.1865/0.0876)	mem 48464MB
[2023-11-10 00:00:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][250/1251]	eta 0:40:03 lr 0.192979	time 2.3888 (2.4008)	model_time 2.3885 (2.3948)	loss 3.4988 (3.3863)	grad_norm 1.1764 (1.1844/0.0870)	mem 48464MB
[2023-11-10 00:00:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][260/1251]	eta 0:39:38 lr 0.192887	time 2.3939 (2.4004)	model_time 2.3936 (2.3947)	loss 4.3762 (3.3915)	grad_norm 1.1501 (1.1835/0.0868)	mem 48464MB
[2023-11-10 00:01:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][270/1251]	eta 0:39:14 lr 0.192793	time 2.3929 (2.4001)	model_time 2.3926 (2.3946)	loss 3.8258 (3.3943)	grad_norm 1.1307 (1.1807/0.0869)	mem 48464MB
[2023-11-10 00:01:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][280/1251]	eta 0:38:50 lr 0.192700	time 2.3934 (2.3999)	model_time 2.3931 (2.3945)	loss 3.2529 (3.3884)	grad_norm 1.0581 (1.1773/0.0878)	mem 48464MB
[2023-11-10 00:01:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][290/1251]	eta 0:38:26 lr 0.192605	time 2.3945 (2.3997)	model_time 2.3942 (2.3945)	loss 3.3882 (3.3845)	grad_norm 1.1752 (1.1744/0.0884)	mem 48464MB
[2023-11-10 00:02:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][300/1251]	eta 0:38:01 lr 0.192510	time 2.3933 (2.3994)	model_time 2.3928 (2.3944)	loss 2.6277 (3.3706)	grad_norm 1.0050 (1.1702/0.0894)	mem 48464MB
[2023-11-10 00:02:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][310/1251]	eta 0:37:37 lr 0.192414	time 2.3938 (2.3992)	model_time 2.3935 (2.3943)	loss 2.0348 (3.3626)	grad_norm 1.0781 (1.1627/0.0874)	mem 48464MB
[2023-11-10 00:03:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][320/1251]	eta 0:37:13 lr 0.192318	time 2.3966 (2.3991)	model_time 2.3962 (2.3943)	loss 3.8869 (3.3544)	grad_norm 1.1112 (1.1569/0.0870)	mem 48464MB
[2023-11-10 00:03:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][330/1251]	eta 0:36:49 lr 0.192221	time 2.3934 (2.3989)	model_time 2.3931 (2.3942)	loss 3.6317 (3.3492)	grad_norm 1.0794 (1.1507/0.0866)	mem 48464MB
[2023-11-10 00:03:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][340/1251]	eta 0:36:25 lr 0.192124	time 2.3917 (2.3987)	model_time 2.3914 (2.3942)	loss 3.5749 (3.3552)	grad_norm 1.1298 (1.1452/0.0855)	mem 48464MB
[2023-11-10 00:04:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][350/1251]	eta 0:36:01 lr 0.192026	time 2.3989 (2.3986)	model_time 2.3986 (2.3942)	loss 3.2700 (3.3604)	grad_norm 1.0932 (1.1387/0.0848)	mem 48464MB
[2023-11-10 00:04:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][360/1251]	eta 0:35:37 lr 0.191927	time 2.3920 (2.3985)	model_time 2.3917 (2.3942)	loss 4.0528 (3.3627)	grad_norm 1.1073 (1.1309/0.0854)	mem 48464MB
[2023-11-10 00:05:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][370/1251]	eta 0:35:13 lr 0.191828	time 2.3923 (2.3984)	model_time 2.3920 (2.3942)	loss 3.3416 (3.3584)	grad_norm 0.9717 (1.1257/0.0856)	mem 48464MB
[2023-11-10 00:05:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][380/1251]	eta 0:34:48 lr 0.191729	time 2.3896 (2.3982)	model_time 2.3893 (2.3941)	loss 3.6910 (3.3585)	grad_norm 1.0943 (1.1198/0.0845)	mem 48464MB
[2023-11-10 00:05:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][390/1251]	eta 0:34:24 lr 0.191628	time 2.3909 (2.3981)	model_time 2.3905 (2.3941)	loss 2.9849 (3.3545)	grad_norm 0.9867 (1.1138/0.0863)	mem 48464MB
[2023-11-10 00:06:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][400/1251]	eta 0:34:00 lr 0.191527	time 2.3926 (2.3980)	model_time 2.3923 (2.3941)	loss 2.2723 (3.3364)	grad_norm 0.9556 (1.1078/0.0872)	mem 48464MB
[2023-11-10 00:06:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][410/1251]	eta 0:33:36 lr 0.191426	time 2.3912 (2.3979)	model_time 2.3908 (2.3941)	loss 2.9197 (3.3246)	grad_norm 1.0334 (1.1010/0.0870)	mem 48464MB
[2023-11-10 00:07:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][420/1251]	eta 0:33:12 lr 0.191324	time 2.3893 (2.3977)	model_time 2.3891 (2.3940)	loss 2.0459 (3.3223)	grad_norm 1.0288 (1.0933/0.0849)	mem 48464MB
[2023-11-10 00:07:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][430/1251]	eta 0:32:48 lr 0.191221	time 2.3921 (2.3976)	model_time 2.3918 (2.3939)	loss 2.6919 (3.3228)	grad_norm 0.9187 (1.0880/0.0836)	mem 48464MB
[2023-11-10 00:07:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][440/1251]	eta 0:32:24 lr 0.191118	time 2.3895 (2.3974)	model_time 2.3892 (2.3938)	loss 4.2907 (3.3230)	grad_norm 1.0276 (1.0822/0.0847)	mem 48464MB
[2023-11-10 00:08:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][450/1251]	eta 0:32:00 lr 0.191014	time 2.3911 (2.3973)	model_time 2.3907 (2.3938)	loss 3.6266 (3.3268)	grad_norm 0.9147 (1.0763/0.0834)	mem 48464MB
[2023-11-10 00:08:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][460/1251]	eta 0:31:36 lr 0.190910	time 2.3916 (2.3972)	model_time 2.3913 (2.3938)	loss 3.3834 (3.3232)	grad_norm 0.9574 (1.0711/0.0829)	mem 48464MB
[2023-11-10 00:09:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][470/1251]	eta 0:31:12 lr 0.190805	time 2.3921 (2.3971)	model_time 2.3918 (2.3937)	loss 3.4289 (3.3155)	grad_norm 1.0362 (1.0663/0.0844)	mem 48464MB
[2023-11-10 00:09:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][480/1251]	eta 0:30:48 lr 0.190700	time 2.3939 (2.3970)	model_time 2.3936 (2.3937)	loss 3.9886 (3.3204)	grad_norm 1.0672 (1.0625/0.0839)	mem 48464MB
[2023-11-10 00:09:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][490/1251]	eta 0:30:24 lr 0.190594	time 2.3941 (2.3974)	model_time 2.3938 (2.3941)	loss 3.3574 (3.3145)	grad_norm 0.9557 (1.0564/0.0809)	mem 48464MB
[2023-11-10 00:10:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][500/1251]	eta 0:30:00 lr 0.190487	time 2.3938 (2.3973)	model_time 2.3934 (2.3941)	loss 3.3694 (3.3075)	grad_norm 0.9383 (1.0494/0.0787)	mem 48464MB
[2023-11-10 00:10:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][510/1251]	eta 0:29:36 lr 0.190380	time 2.3949 (2.3973)	model_time 2.3946 (2.3941)	loss 3.3712 (3.3041)	grad_norm 0.9869 (1.0458/0.0781)	mem 48464MB
[2023-11-10 00:11:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][520/1251]	eta 0:29:12 lr 0.190272	time 2.3903 (2.3972)	model_time 2.3901 (2.3941)	loss 2.0942 (3.3027)	grad_norm 0.8647 (1.0410/0.0781)	mem 48464MB
[2023-11-10 00:11:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][530/1251]	eta 0:28:48 lr 0.190164	time 2.3944 (2.3971)	model_time 2.3941 (2.3940)	loss 3.3086 (3.3035)	grad_norm 0.8823 (1.0378/0.0785)	mem 48464MB
[2023-11-10 00:11:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][540/1251]	eta 0:28:24 lr 0.190055	time 2.3908 (2.3970)	model_time 2.3905 (2.3940)	loss 3.5394 (3.2985)	grad_norm 1.0220 (1.0319/0.0795)	mem 48464MB
[2023-11-10 00:12:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][550/1251]	eta 0:28:00 lr 0.189945	time 2.3903 (2.3970)	model_time 2.3900 (2.3940)	loss 3.1480 (3.3017)	grad_norm 0.9531 (1.0256/0.0783)	mem 48464MB
[2023-11-10 00:12:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][560/1251]	eta 0:27:36 lr 0.189835	time 2.3908 (2.3969)	model_time 2.3906 (2.3940)	loss 4.0912 (3.2983)	grad_norm 0.9893 (1.0183/0.0749)	mem 48464MB
[2023-11-10 00:13:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][570/1251]	eta 0:27:12 lr 0.189725	time 2.3927 (2.3968)	model_time 2.3925 (2.3939)	loss 2.4276 (3.2958)	grad_norm 0.9146 (1.0130/0.0744)	mem 48464MB
[2023-11-10 00:13:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][580/1251]	eta 0:26:48 lr 0.189614	time 2.6072 (2.3971)	model_time 2.6069 (2.3943)	loss 3.4824 (3.2977)	grad_norm 1.0230 (1.0087/0.0743)	mem 48464MB
[2023-11-10 00:13:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][590/1251]	eta 0:26:24 lr 0.189502	time 2.3897 (2.3970)	model_time 2.3894 (2.3942)	loss 2.3023 (3.2917)	grad_norm 0.8094 (1.0025/0.0743)	mem 48464MB
[2023-11-10 00:14:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][600/1251]	eta 0:26:00 lr 0.189390	time 2.3920 (2.3969)	model_time 2.3918 (2.3942)	loss 2.6799 (3.2852)	grad_norm 1.0352 (0.9992/0.0745)	mem 48464MB
[2023-11-10 00:14:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][610/1251]	eta 0:25:36 lr 0.189277	time 2.3922 (2.3968)	model_time 2.3920 (2.3941)	loss 3.6445 (3.2838)	grad_norm 0.8931 (0.9942/0.0720)	mem 48464MB
[2023-11-10 00:15:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][620/1251]	eta 0:25:12 lr 0.189163	time 2.3905 (2.3967)	model_time 2.3902 (2.3941)	loss 3.0033 (3.2781)	grad_norm 0.9987 (0.9900/0.0719)	mem 48464MB
[2023-11-10 00:15:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][630/1251]	eta 0:24:48 lr 0.189049	time 2.3903 (2.3967)	model_time 2.3901 (2.3941)	loss 3.3341 (3.2833)	grad_norm 0.9103 (0.9853/0.0713)	mem 48464MB
[2023-11-10 00:15:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][640/1251]	eta 0:24:24 lr 0.188935	time 2.3897 (2.3966)	model_time 2.3895 (2.3940)	loss 2.3965 (3.2773)	grad_norm 0.8654 (0.9795/0.0684)	mem 48464MB
[2023-11-10 00:16:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][650/1251]	eta 0:24:00 lr 0.188820	time 2.3913 (2.3966)	model_time 2.3911 (2.3940)	loss 3.3728 (3.2751)	grad_norm 0.9373 (0.9740/0.0699)	mem 48464MB
[2023-11-10 00:16:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][660/1251]	eta 0:23:36 lr 0.188704	time 2.3927 (2.3968)	model_time 2.3925 (2.3942)	loss 3.2040 (3.2734)	grad_norm 0.8264 (0.9709/0.0698)	mem 48464MB
[2023-11-10 00:17:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][670/1251]	eta 0:23:12 lr 0.188588	time 2.4038 (2.3968)	model_time 2.4036 (2.3943)	loss 3.0877 (3.2648)	grad_norm 0.9033 (0.9658/0.0692)	mem 48464MB
[2023-11-10 00:17:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][680/1251]	eta 0:22:48 lr 0.188471	time 2.3893 (2.3967)	model_time 2.3890 (2.3942)	loss 2.1558 (3.2595)	grad_norm 0.9133 (0.9603/0.0710)	mem 48464MB
[2023-11-10 00:17:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][690/1251]	eta 0:22:24 lr 0.188354	time 2.3909 (2.3967)	model_time 2.3907 (2.3942)	loss 3.4211 (3.2574)	grad_norm 0.8341 (0.9558/0.0714)	mem 48464MB
[2023-11-10 00:18:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][700/1251]	eta 0:22:00 lr 0.188236	time 2.3935 (2.3966)	model_time 2.3933 (2.3942)	loss 3.1013 (3.2521)	grad_norm 0.8226 (0.9516/0.0723)	mem 48464MB
[2023-11-10 00:18:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][710/1251]	eta 0:21:36 lr 0.188117	time 2.3913 (2.3966)	model_time 2.3909 (2.3942)	loss 4.5761 (3.2507)	grad_norm 0.8406 (0.9485/0.0721)	mem 48464MB
[2023-11-10 00:19:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][720/1251]	eta 0:21:12 lr 0.187998	time 2.3934 (2.3965)	model_time 2.3932 (2.3942)	loss 3.2847 (3.2435)	grad_norm 0.9328 (0.9438/0.0738)	mem 48464MB
[2023-11-10 00:19:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][730/1251]	eta 0:20:48 lr 0.187879	time 2.3901 (2.3965)	model_time 2.3897 (2.3942)	loss 2.1469 (3.2382)	grad_norm 0.8432 (0.9394/0.0714)	mem 48464MB
[2023-11-10 00:19:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][740/1251]	eta 0:20:24 lr 0.187759	time 2.3967 (2.3964)	model_time 2.3964 (2.3942)	loss 2.8501 (3.2364)	grad_norm 0.8354 (0.9358/0.0714)	mem 48464MB
[2023-11-10 00:20:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][750/1251]	eta 0:20:00 lr 0.187638	time 2.3885 (2.3964)	model_time 2.3883 (2.3941)	loss 2.7453 (3.2324)	grad_norm 0.9074 (0.9306/0.0719)	mem 48464MB
[2023-11-10 00:20:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][760/1251]	eta 0:19:36 lr 0.187517	time 2.3913 (2.3963)	model_time 2.3909 (2.3941)	loss 3.1500 (3.2362)	grad_norm 0.8236 (0.9253/0.0719)	mem 48464MB
[2023-11-10 00:21:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][770/1251]	eta 0:19:12 lr 0.187395	time 2.3937 (2.3963)	model_time 2.3935 (2.3941)	loss 3.1453 (3.2322)	grad_norm 0.8732 (0.9210/0.0723)	mem 48464MB
[2023-11-10 00:21:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][780/1251]	eta 0:18:48 lr 0.187273	time 2.3927 (2.3963)	model_time 2.3923 (2.3941)	loss 3.2668 (3.2316)	grad_norm 0.7930 (0.9159/0.0696)	mem 48464MB
[2023-11-10 00:21:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][790/1251]	eta 0:18:24 lr 0.187150	time 2.3886 (2.3962)	model_time 2.3884 (2.3940)	loss 2.7103 (3.2306)	grad_norm 0.8026 (0.9120/0.0686)	mem 48464MB
[2023-11-10 00:22:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][800/1251]	eta 0:18:00 lr 0.187026	time 2.3911 (2.3963)	model_time 2.3909 (2.3942)	loss 3.0028 (3.2271)	grad_norm 0.8240 (0.9079/0.0690)	mem 48464MB
[2023-11-10 00:22:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][810/1251]	eta 0:17:36 lr 0.186902	time 2.3930 (2.3963)	model_time 2.3928 (2.3942)	loss 1.9411 (3.2240)	grad_norm 0.8924 (0.9034/0.0670)	mem 48464MB
[2023-11-10 00:23:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][820/1251]	eta 0:17:12 lr 0.186778	time 2.3968 (2.3963)	model_time 2.3964 (2.3942)	loss 3.2369 (3.2253)	grad_norm 0.7853 (0.8982/0.0665)	mem 48464MB
[2023-11-10 00:23:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][830/1251]	eta 0:16:48 lr 0.186653	time 2.3920 (2.3962)	model_time 2.3918 (2.3941)	loss 1.7041 (3.2207)	grad_norm 0.8299 (0.8928/0.0646)	mem 48464MB
[2023-11-10 00:23:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][840/1251]	eta 0:16:24 lr 0.186527	time 2.3902 (2.3962)	model_time 2.3900 (2.3941)	loss 2.2331 (3.2223)	grad_norm 0.8190 (0.8889/0.0660)	mem 48464MB
[2023-11-10 00:24:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][850/1251]	eta 0:16:00 lr 0.186401	time 2.3927 (2.3961)	model_time 2.3925 (2.3941)	loss 3.3533 (3.2204)	grad_norm 0.8725 (0.8860/0.0653)	mem 48464MB
[2023-11-10 00:24:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][860/1251]	eta 0:15:36 lr 0.186274	time 2.3919 (2.3961)	model_time 2.3916 (2.3941)	loss 2.2770 (3.2196)	grad_norm 0.7863 (0.8819/0.0654)	mem 48464MB
[2023-11-10 00:25:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][870/1251]	eta 0:15:12 lr 0.186147	time 2.3936 (2.3961)	model_time 2.3934 (2.3941)	loss 2.7283 (3.2173)	grad_norm 0.7758 (0.8766/0.0651)	mem 48464MB
[2023-11-10 00:25:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][880/1251]	eta 0:14:48 lr 0.186019	time 2.3919 (2.3960)	model_time 2.3917 (2.3940)	loss 3.6980 (3.2179)	grad_norm 0.8093 (0.8720/0.0638)	mem 48464MB
[2023-11-10 00:25:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][890/1251]	eta 0:14:24 lr 0.185891	time 2.3919 (2.3960)	model_time 2.3916 (2.3940)	loss 3.0736 (3.2175)	grad_norm 0.8138 (0.8693/0.0638)	mem 48464MB
[2023-11-10 00:26:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][900/1251]	eta 0:14:00 lr 0.185762	time 2.3963 (2.3960)	model_time 2.3961 (2.3940)	loss 3.5932 (3.2161)	grad_norm 0.7867 (0.8639/0.0620)	mem 48464MB
[2023-11-10 00:26:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][910/1251]	eta 0:13:37 lr 0.185633	time 2.3942 (2.3959)	model_time 2.3939 (2.3940)	loss 2.7080 (3.2148)	grad_norm 0.8259 (0.8605/0.0608)	mem 48464MB
[2023-11-10 00:27:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][920/1251]	eta 0:13:13 lr 0.185503	time 2.3941 (2.3959)	model_time 2.3938 (2.3940)	loss 2.3309 (3.2166)	grad_norm 0.7993 (0.8568/0.0597)	mem 48464MB
[2023-11-10 00:27:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][930/1251]	eta 0:12:49 lr 0.185372	time 2.3925 (2.3959)	model_time 2.3922 (2.3940)	loss 3.2352 (3.2137)	grad_norm 0.9046 (0.8529/0.0593)	mem 48464MB
[2023-11-10 00:27:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][940/1251]	eta 0:12:25 lr 0.185241	time 2.3945 (2.3958)	model_time 2.3943 (2.3940)	loss 2.3373 (3.2141)	grad_norm 0.7384 (0.8493/0.0586)	mem 48464MB
[2023-11-10 00:28:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][950/1251]	eta 0:12:01 lr 0.185109	time 2.3971 (2.3958)	model_time 2.3968 (2.3940)	loss 3.0113 (3.2099)	grad_norm 0.7348 (0.8470/0.0591)	mem 48464MB
[2023-11-10 00:28:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][960/1251]	eta 0:11:37 lr 0.184977	time 2.3901 (2.3959)	model_time 2.3899 (2.3941)	loss 3.9223 (3.2082)	grad_norm 0.8568 (0.8432/0.0576)	mem 48464MB
[2023-11-10 00:28:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][970/1251]	eta 0:11:13 lr 0.184845	time 2.3931 (2.3959)	model_time 2.3929 (2.3941)	loss 2.8605 (3.2035)	grad_norm 0.8845 (0.8394/0.0582)	mem 48464MB
[2023-11-10 00:29:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][980/1251]	eta 0:10:49 lr 0.184712	time 2.3945 (2.3959)	model_time 2.3942 (2.3941)	loss 1.8924 (3.1997)	grad_norm 0.7748 (0.8367/0.0595)	mem 48464MB
[2023-11-10 00:29:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][990/1251]	eta 0:10:25 lr 0.184578	time 2.3898 (2.3958)	model_time 2.3896 (2.3940)	loss 3.1667 (3.1976)	grad_norm 0.7942 (0.8331/0.0598)	mem 48464MB
[2023-11-10 00:30:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1000/1251]	eta 0:10:01 lr 0.184444	time 2.3973 (2.3958)	model_time 2.3969 (2.3940)	loss 3.6863 (3.1944)	grad_norm 0.8371 (0.8304/0.0593)	mem 48464MB
[2023-11-10 00:30:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1010/1251]	eta 0:09:37 lr 0.184309	time 2.3925 (2.3958)	model_time 2.3923 (2.3940)	loss 3.2601 (3.1916)	grad_norm 0.7713 (0.8262/0.0579)	mem 48464MB
[2023-11-10 00:30:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1020/1251]	eta 0:09:13 lr 0.184173	time 2.3934 (2.3958)	model_time 2.3932 (2.3940)	loss 3.8657 (3.1925)	grad_norm 0.7378 (0.8237/0.0582)	mem 48464MB
[2023-11-10 00:31:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1030/1251]	eta 0:08:49 lr 0.184038	time 2.3912 (2.3957)	model_time 2.3909 (2.3939)	loss 3.2934 (3.1926)	grad_norm 0.7798 (0.8195/0.0559)	mem 48464MB
[2023-11-10 00:31:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1040/1251]	eta 0:08:25 lr 0.183901	time 2.3934 (2.3957)	model_time 2.3932 (2.3939)	loss 3.2555 (3.1901)	grad_norm 0.7252 (0.8160/0.0549)	mem 48464MB
[2023-11-10 00:32:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1050/1251]	eta 0:08:01 lr 0.183764	time 2.3921 (2.3957)	model_time 2.3919 (2.3939)	loss 1.9669 (3.1852)	grad_norm 0.7943 (0.8137/0.0552)	mem 48464MB
[2023-11-10 00:32:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1060/1251]	eta 0:07:37 lr 0.183627	time 2.3864 (2.3956)	model_time 2.3862 (2.3939)	loss 3.1765 (3.1826)	grad_norm 0.7752 (0.8110/0.0562)	mem 48464MB
[2023-11-10 00:32:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1070/1251]	eta 0:07:13 lr 0.183489	time 2.3901 (2.3956)	model_time 2.3898 (2.3939)	loss 3.2906 (3.1833)	grad_norm 0.8260 (0.8085/0.0557)	mem 48464MB
[2023-11-10 00:33:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1080/1251]	eta 0:06:49 lr 0.183350	time 2.3896 (2.3956)	model_time 2.3894 (2.3938)	loss 3.9148 (3.1849)	grad_norm 0.8883 (0.8052/0.0534)	mem 48464MB
[2023-11-10 00:33:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1090/1251]	eta 0:06:25 lr 0.183211	time 2.3894 (2.3955)	model_time 2.3892 (2.3938)	loss 3.1177 (3.1831)	grad_norm 0.6771 (0.8011/0.0543)	mem 48464MB
[2023-11-10 00:34:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1100/1251]	eta 0:06:01 lr 0.183072	time 2.3898 (2.3955)	model_time 2.3896 (2.3938)	loss 2.8315 (3.1818)	grad_norm 0.7598 (0.7986/0.0546)	mem 48464MB
[2023-11-10 00:34:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1110/1251]	eta 0:05:37 lr 0.182932	time 2.3935 (2.3955)	model_time 2.3933 (2.3938)	loss 3.3288 (3.1799)	grad_norm 0.7298 (0.7940/0.0542)	mem 48464MB
[2023-11-10 00:34:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1120/1251]	eta 0:05:13 lr 0.182791	time 2.3931 (2.3954)	model_time 2.3928 (2.3938)	loss 3.6816 (3.1781)	grad_norm 0.8061 (0.7921/0.0542)	mem 48464MB
[2023-11-10 00:35:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1130/1251]	eta 0:04:49 lr 0.182650	time 2.3922 (2.3955)	model_time 2.3920 (2.3939)	loss 3.1777 (3.1785)	grad_norm 0.7820 (0.7905/0.0548)	mem 48464MB
[2023-11-10 00:35:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1140/1251]	eta 0:04:25 lr 0.182509	time 2.3935 (2.3955)	model_time 2.3933 (2.3938)	loss 2.6888 (3.1754)	grad_norm 0.7681 (0.7882/0.0547)	mem 48464MB
[2023-11-10 00:36:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1150/1251]	eta 0:04:01 lr 0.182366	time 2.3950 (2.3955)	model_time 2.3947 (2.3938)	loss 3.4342 (3.1741)	grad_norm 0.7421 (0.7846/0.0535)	mem 48464MB
[2023-11-10 00:36:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1160/1251]	eta 0:03:37 lr 0.182224	time 2.3910 (2.3954)	model_time 2.3907 (2.3938)	loss 2.7160 (3.1736)	grad_norm 0.7601 (0.7827/0.0528)	mem 48464MB
[2023-11-10 00:36:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1170/1251]	eta 0:03:14 lr 0.182081	time 2.3920 (2.3954)	model_time 2.3917 (2.3938)	loss 2.1461 (3.1698)	grad_norm 0.7135 (0.7812/0.0535)	mem 48464MB
[2023-11-10 00:37:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1180/1251]	eta 0:02:50 lr 0.181937	time 2.3907 (2.3954)	model_time 2.3904 (2.3938)	loss 3.0036 (3.1702)	grad_norm 0.7752 (0.7789/0.0536)	mem 48464MB
[2023-11-10 00:37:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1190/1251]	eta 0:02:26 lr 0.181793	time 2.3905 (2.3954)	model_time 2.3902 (2.3938)	loss 3.1045 (3.1693)	grad_norm 0.7986 (0.7764/0.0536)	mem 48464MB
[2023-11-10 00:38:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1200/1251]	eta 0:02:02 lr 0.181648	time 2.3912 (2.3953)	model_time 2.3910 (2.3938)	loss 4.0369 (3.1702)	grad_norm 0.7582 (0.7741/0.0534)	mem 48464MB
[2023-11-10 00:38:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1210/1251]	eta 0:01:38 lr 0.181503	time 2.3901 (2.3953)	model_time 2.3897 (2.3937)	loss 1.9163 (3.1680)	grad_norm 0.7553 (0.7706/0.0523)	mem 48464MB
[2023-11-10 00:38:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1220/1251]	eta 0:01:14 lr 0.181357	time 2.3950 (2.3953)	model_time 2.3947 (2.3937)	loss 2.3254 (3.1664)	grad_norm 0.6674 (0.7675/0.0521)	mem 48464MB
[2023-11-10 00:39:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1230/1251]	eta 0:00:50 lr 0.181211	time 2.3918 (2.3953)	model_time 2.3916 (2.3937)	loss 2.8850 (3.1646)	grad_norm 0.6724 (0.7656/0.0518)	mem 48464MB
[2023-11-10 00:39:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1240/1251]	eta 0:00:26 lr 0.181064	time 2.3914 (2.3953)	model_time 2.3913 (2.3937)	loss 2.9520 (3.1655)	grad_norm 0.6843 (0.7627/0.0510)	mem 48464MB
[2023-11-10 00:40:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [1/10][1250/1251]	eta 0:00:02 lr 0.180916	time 2.3918 (2.3953)	model_time 2.3916 (2.3937)	loss 1.7447 (3.1630)	grad_norm 0.6837 (0.7600/0.0501)	mem 48464MB
[2023-11-10 00:40:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 1 training takes 0:49:56
[2023-11-10 00:40:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_1.pth saving......
[2023-11-10 00:42:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_1.pth saved !!!
[2023-11-10 00:42:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.756 (3.756)	Loss 0.7197 (0.7197)	Acc@1 85.156 (85.156)	Acc@5 97.559 (97.559)	Mem 48464MB
[2023-11-10 00:42:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.246 (2.374)	Loss 0.8398 (0.7255)	Acc@1 82.617 (85.485)	Acc@5 96.582 (97.470)	Mem 48464MB
[2023-11-10 00:43:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.250 (2.315)	Loss 0.6665 (0.7251)	Acc@1 85.840 (85.398)	Acc@5 98.047 (97.526)	Mem 48464MB
[2023-11-10 00:43:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.294)	Loss 0.7969 (0.7311)	Acc@1 83.496 (85.232)	Acc@5 96.777 (97.496)	Mem 48464MB
[2023-11-10 00:43:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.253 (2.284)	Loss 0.7847 (0.7324)	Acc@1 84.180 (85.197)	Acc@5 97.070 (97.513)	Mem 48464MB
[2023-11-10 00:44:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:1] * Acc@1 85.358 Acc@5 97.534
[2023-11-10 00:44:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 85.4%
[2023-11-10 00:44:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-10 00:45:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-10 00:45:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 85.36%
[2023-11-10 00:45:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.734 (3.734)	Loss 0.5693 (0.5693)	Acc@1 87.500 (87.500)	Acc@5 98.535 (98.535)	Mem 48464MB
[2023-11-10 00:46:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.247 (2.373)	Loss 0.6846 (0.5705)	Acc@1 85.742 (87.997)	Acc@5 97.168 (98.189)	Mem 48464MB
[2023-11-10 00:46:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.251 (2.315)	Loss 0.4924 (0.5677)	Acc@1 89.160 (88.039)	Acc@5 98.730 (98.247)	Mem 48464MB
[2023-11-10 00:46:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.250 (2.294)	Loss 0.6270 (0.5729)	Acc@1 86.523 (87.758)	Acc@5 97.949 (98.201)	Mem 48464MB
[2023-11-10 00:47:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.253 (2.284)	Loss 0.6211 (0.5743)	Acc@1 86.328 (87.679)	Acc@5 97.461 (98.187)	Mem 48464MB
[2023-11-10 00:47:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:1] * Acc@1 87.724 Acc@5 98.200
[2023-11-10 00:47:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 87.7%
[2023-11-10 00:47:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saving......
[2023-11-10 00:49:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saved !!!
[2023-11-10 00:49:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 87.72%
[2023-11-10 00:49:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][0/1251]	eta 1:19:29 lr 0.180902	time 3.8124 (3.8124)	model_time 2.3941 (2.3941)	loss 2.7900 (2.7900)	grad_norm 0.6804 (0.6804/0.0000)	mem 48464MB
[2023-11-10 00:49:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][10/1251]	eta 0:51:55 lr 0.180754	time 2.3892 (2.5106)	model_time 2.3886 (2.3813)	loss 2.1954 (2.9299)	grad_norm 0.6628 (0.6994/0.0402)	mem 48464MB
[2023-11-10 00:50:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][20/1251]	eta 0:50:20 lr 0.180605	time 2.3971 (2.4537)	model_time 2.3968 (2.3858)	loss 1.9105 (2.9935)	grad_norm 0.7131 (0.7014/0.0404)	mem 48464MB
[2023-11-10 00:50:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][30/1251]	eta 0:49:42 lr 0.180457	time 2.3958 (2.4428)	model_time 2.3956 (2.3965)	loss 2.3913 (3.0167)	grad_norm 0.6504 (0.7031/0.0484)	mem 48464MB
[2023-11-10 00:51:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][40/1251]	eta 0:49:04 lr 0.180307	time 2.3926 (2.4312)	model_time 2.3922 (2.3961)	loss 3.8777 (3.0298)	grad_norm 0.7748 (0.7015/0.0478)	mem 48464MB
[2023-11-10 00:51:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][50/1251]	eta 0:48:31 lr 0.180157	time 2.3946 (2.4241)	model_time 2.3943 (2.3958)	loss 3.0535 (3.0238)	grad_norm 0.6519 (0.6962/0.0463)	mem 48464MB
[2023-11-10 00:51:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][60/1251]	eta 0:48:01 lr 0.180007	time 2.3912 (2.4191)	model_time 2.3907 (2.3954)	loss 2.7677 (3.0698)	grad_norm 0.7072 (0.6970/0.0461)	mem 48464MB
[2023-11-10 00:52:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][70/1251]	eta 0:47:35 lr 0.179856	time 2.3926 (2.4180)	model_time 2.3922 (2.3976)	loss 3.2162 (3.0902)	grad_norm 0.6395 (0.6976/0.0452)	mem 48464MB
[2023-11-10 00:52:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][80/1251]	eta 0:47:08 lr 0.179705	time 2.3949 (2.4151)	model_time 2.3947 (2.3971)	loss 3.1135 (3.0797)	grad_norm 0.7268 (0.6964/0.0467)	mem 48464MB
[2023-11-10 00:53:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][90/1251]	eta 0:46:41 lr 0.179553	time 2.3907 (2.4130)	model_time 2.3904 (2.3966)	loss 3.4520 (3.0502)	grad_norm 0.6349 (0.6962/0.0468)	mem 48464MB
[2023-11-10 00:53:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][100/1251]	eta 0:46:15 lr 0.179400	time 2.3952 (2.4110)	model_time 2.3950 (2.3961)	loss 3.0568 (3.0530)	grad_norm 0.6378 (0.6951/0.0460)	mem 48464MB
[2023-11-10 00:53:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][110/1251]	eta 0:45:49 lr 0.179247	time 2.3944 (2.4096)	model_time 2.3940 (2.3960)	loss 2.3586 (3.0199)	grad_norm 0.6383 (0.6934/0.0456)	mem 48464MB
[2023-11-10 00:54:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][120/1251]	eta 0:45:23 lr 0.179094	time 2.3917 (2.4082)	model_time 2.3914 (2.3957)	loss 3.0974 (3.0226)	grad_norm 0.7033 (0.6923/0.0447)	mem 48464MB
[2023-11-10 00:54:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][130/1251]	eta 0:44:58 lr 0.178940	time 2.3936 (2.4070)	model_time 2.3934 (2.3955)	loss 3.8719 (3.0192)	grad_norm 0.7025 (0.6943/0.0458)	mem 48464MB
[2023-11-10 00:55:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][140/1251]	eta 0:44:33 lr 0.178786	time 2.3939 (2.4061)	model_time 2.3935 (2.3954)	loss 2.5916 (3.0349)	grad_norm 0.7895 (0.6960/0.0479)	mem 48464MB
[2023-11-10 00:55:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][150/1251]	eta 0:44:08 lr 0.178631	time 2.3921 (2.4053)	model_time 2.3918 (2.3952)	loss 2.0549 (3.0197)	grad_norm 0.7247 (0.6963/0.0468)	mem 48464MB
[2023-11-10 00:55:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][160/1251]	eta 0:43:43 lr 0.178475	time 2.3933 (2.4044)	model_time 2.3931 (2.3949)	loss 3.1798 (3.0255)	grad_norm 0.7073 (0.6962/0.0470)	mem 48464MB
[2023-11-10 00:56:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][170/1251]	eta 0:43:18 lr 0.178319	time 2.3934 (2.4038)	model_time 2.3931 (2.3948)	loss 3.0515 (3.0271)	grad_norm 0.6479 (0.6959/0.0468)	mem 48464MB
[2023-11-10 00:56:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][180/1251]	eta 0:42:53 lr 0.178163	time 2.3921 (2.4031)	model_time 2.3919 (2.3946)	loss 3.2694 (3.0345)	grad_norm 0.6458 (0.6951/0.0470)	mem 48464MB
[2023-11-10 00:56:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][190/1251]	eta 0:42:29 lr 0.178006	time 2.3907 (2.4027)	model_time 2.3904 (2.3946)	loss 2.1086 (3.0354)	grad_norm 0.7279 (0.6953/0.0466)	mem 48464MB
[2023-11-10 00:57:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][200/1251]	eta 0:42:04 lr 0.177849	time 2.3924 (2.4022)	model_time 2.3921 (2.3945)	loss 3.2307 (3.0453)	grad_norm 0.7368 (0.6945/0.0462)	mem 48464MB
[2023-11-10 00:57:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][210/1251]	eta 0:41:40 lr 0.177691	time 2.3954 (2.4018)	model_time 2.3952 (2.3944)	loss 3.1118 (3.0576)	grad_norm 0.6128 (0.6944/0.0458)	mem 48464MB
[2023-11-10 00:58:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][220/1251]	eta 0:41:15 lr 0.177533	time 2.3937 (2.4014)	model_time 2.3935 (2.3944)	loss 3.0829 (3.0501)	grad_norm 0.6839 (0.6933/0.0456)	mem 48464MB
[2023-11-10 00:58:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][230/1251]	eta 0:40:51 lr 0.177374	time 2.3934 (2.4011)	model_time 2.3931 (2.3944)	loss 3.7371 (3.0624)	grad_norm 0.7613 (0.6935/0.0462)	mem 48464MB
[2023-11-10 00:58:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][240/1251]	eta 0:40:27 lr 0.177214	time 2.3964 (2.4008)	model_time 2.3961 (2.3943)	loss 3.2184 (3.0411)	grad_norm 0.7257 (0.6914/0.0470)	mem 48464MB
[2023-11-10 00:59:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][250/1251]	eta 0:40:02 lr 0.177055	time 2.3916 (2.4005)	model_time 2.3913 (2.3943)	loss 2.1749 (3.0323)	grad_norm 0.6649 (0.6904/0.0467)	mem 48464MB
[2023-11-10 00:59:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][260/1251]	eta 0:39:38 lr 0.176894	time 2.3938 (2.4003)	model_time 2.3934 (2.3943)	loss 3.6984 (3.0291)	grad_norm 0.7071 (0.6899/0.0464)	mem 48464MB
[2023-11-10 01:00:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][270/1251]	eta 0:39:14 lr 0.176733	time 2.3954 (2.4000)	model_time 2.3951 (2.3942)	loss 3.5195 (3.0331)	grad_norm 0.6323 (0.6884/0.0468)	mem 48464MB
[2023-11-10 01:00:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][280/1251]	eta 0:38:50 lr 0.176572	time 2.3939 (2.3998)	model_time 2.3936 (2.3942)	loss 3.9272 (3.0311)	grad_norm 0.7331 (0.6872/0.0468)	mem 48464MB
[2023-11-10 01:00:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][290/1251]	eta 0:38:26 lr 0.176410	time 2.3910 (2.3996)	model_time 2.3908 (2.3942)	loss 2.1612 (3.0140)	grad_norm 0.5707 (0.6853/0.0477)	mem 48464MB
[2023-11-10 01:01:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][300/1251]	eta 0:38:01 lr 0.176248	time 2.3938 (2.3994)	model_time 2.3934 (2.3942)	loss 3.3493 (3.0119)	grad_norm 0.6485 (0.6846/0.0481)	mem 48464MB
[2023-11-10 01:01:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][310/1251]	eta 0:37:37 lr 0.176085	time 2.3942 (2.3992)	model_time 2.3940 (2.3941)	loss 3.1144 (3.0111)	grad_norm 0.6704 (0.6827/0.0484)	mem 48464MB
[2023-11-10 01:02:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][320/1251]	eta 0:37:13 lr 0.175922	time 2.3939 (2.3990)	model_time 2.3936 (2.3941)	loss 2.9262 (3.0113)	grad_norm 0.7156 (0.6812/0.0484)	mem 48464MB
[2023-11-10 01:02:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][330/1251]	eta 0:36:49 lr 0.175759	time 2.3915 (2.3992)	model_time 2.3912 (2.3944)	loss 3.3652 (3.0056)	grad_norm 0.6425 (0.6795/0.0479)	mem 48464MB
[2023-11-10 01:02:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][340/1251]	eta 0:36:25 lr 0.175594	time 2.3912 (2.3991)	model_time 2.3909 (2.3944)	loss 2.0265 (2.9997)	grad_norm 0.6042 (0.6785/0.0483)	mem 48464MB
[2023-11-10 01:03:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][350/1251]	eta 0:36:01 lr 0.175430	time 2.3963 (2.3989)	model_time 2.3960 (2.3944)	loss 3.0177 (2.9974)	grad_norm 0.6941 (0.6784/0.0484)	mem 48464MB
[2023-11-10 01:03:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][360/1251]	eta 0:35:37 lr 0.175265	time 2.3897 (2.3988)	model_time 2.3893 (2.3943)	loss 2.4113 (2.9773)	grad_norm 0.6534 (0.6761/0.0483)	mem 48464MB
[2023-11-10 01:04:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][370/1251]	eta 0:35:13 lr 0.175099	time 2.3919 (2.3990)	model_time 2.3915 (2.3947)	loss 3.1304 (2.9726)	grad_norm 0.6783 (0.6742/0.0483)	mem 48464MB
[2023-11-10 01:04:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][380/1251]	eta 0:34:49 lr 0.174933	time 2.3946 (2.3989)	model_time 2.3942 (2.3947)	loss 2.6695 (2.9696)	grad_norm 0.6040 (0.6738/0.0480)	mem 48464MB
[2023-11-10 01:04:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][390/1251]	eta 0:34:25 lr 0.174766	time 2.3889 (2.3989)	model_time 2.3885 (2.3947)	loss 3.4423 (2.9647)	grad_norm 0.7142 (0.6720/0.0478)	mem 48464MB
[2023-11-10 01:05:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][400/1251]	eta 0:34:01 lr 0.174599	time 2.4004 (2.3988)	model_time 2.3997 (2.3947)	loss 1.7787 (2.9638)	grad_norm 0.6406 (0.6703/0.0480)	mem 48464MB
[2023-11-10 01:05:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][410/1251]	eta 0:33:37 lr 0.174432	time 2.3947 (2.3987)	model_time 2.3939 (2.3947)	loss 3.8669 (2.9609)	grad_norm 0.7364 (0.6698/0.0481)	mem 48464MB
[2023-11-10 01:06:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][420/1251]	eta 0:33:13 lr 0.174264	time 2.3951 (2.3986)	model_time 2.3947 (2.3947)	loss 3.1798 (2.9606)	grad_norm 0.7059 (0.6690/0.0481)	mem 48464MB
[2023-11-10 01:06:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][430/1251]	eta 0:32:49 lr 0.174096	time 2.3969 (2.3985)	model_time 2.3962 (2.3947)	loss 3.2048 (2.9672)	grad_norm 0.5896 (0.6667/0.0474)	mem 48464MB
[2023-11-10 01:06:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][440/1251]	eta 0:32:25 lr 0.173927	time 2.3935 (2.3985)	model_time 2.3931 (2.3947)	loss 2.1983 (2.9725)	grad_norm 0.6070 (0.6645/0.0458)	mem 48464MB
[2023-11-10 01:07:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][450/1251]	eta 0:32:01 lr 0.173757	time 2.4006 (2.3984)	model_time 2.4003 (2.3947)	loss 3.7571 (2.9666)	grad_norm 0.6145 (0.6622/0.0457)	mem 48464MB
[2023-11-10 01:07:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][460/1251]	eta 0:31:37 lr 0.173588	time 2.3939 (2.3983)	model_time 2.3935 (2.3948)	loss 2.3476 (2.9640)	grad_norm 0.6283 (0.6597/0.0457)	mem 48464MB
[2023-11-10 01:08:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][470/1251]	eta 0:31:13 lr 0.173417	time 2.3988 (2.3983)	model_time 2.3984 (2.3948)	loss 2.2927 (2.9632)	grad_norm 0.6441 (0.6580/0.0451)	mem 48464MB
[2023-11-10 01:08:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][480/1251]	eta 0:30:49 lr 0.173247	time 2.3923 (2.3983)	model_time 2.3920 (2.3948)	loss 2.5428 (2.9589)	grad_norm 0.6794 (0.6561/0.0449)	mem 48464MB
[2023-11-10 01:08:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][490/1251]	eta 0:30:25 lr 0.173075	time 2.3988 (2.3982)	model_time 2.3983 (2.3948)	loss 3.8845 (2.9633)	grad_norm 0.6743 (0.6539/0.0447)	mem 48464MB
[2023-11-10 01:09:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][500/1251]	eta 0:30:01 lr 0.172904	time 2.3926 (2.3984)	model_time 2.3923 (2.3950)	loss 3.1823 (2.9623)	grad_norm 0.6319 (0.6525/0.0449)	mem 48464MB
[2023-11-10 01:09:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][510/1251]	eta 0:29:37 lr 0.172732	time 2.3917 (2.3983)	model_time 2.3913 (2.3950)	loss 2.9875 (2.9587)	grad_norm 0.6112 (0.6510/0.0446)	mem 48464MB
[2023-11-10 01:10:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][520/1251]	eta 0:29:13 lr 0.172559	time 2.3973 (2.3983)	model_time 2.3969 (2.3951)	loss 2.8665 (2.9580)	grad_norm 0.5991 (0.6500/0.0454)	mem 48464MB
[2023-11-10 01:10:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][530/1251]	eta 0:28:49 lr 0.172386	time 2.3948 (2.3983)	model_time 2.3943 (2.3951)	loss 3.3610 (2.9559)	grad_norm 0.6168 (0.6480/0.0437)	mem 48464MB
[2023-11-10 01:10:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][540/1251]	eta 0:28:25 lr 0.172213	time 2.3922 (2.3985)	model_time 2.3919 (2.3953)	loss 3.2448 (2.9550)	grad_norm 0.6943 (0.6481/0.0437)	mem 48464MB
[2023-11-10 01:11:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][550/1251]	eta 0:28:01 lr 0.172039	time 2.3951 (2.3984)	model_time 2.3947 (2.3953)	loss 2.5023 (2.9557)	grad_norm 0.5880 (0.6468/0.0436)	mem 48464MB
[2023-11-10 01:11:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][560/1251]	eta 0:27:37 lr 0.171864	time 2.3942 (2.3983)	model_time 2.3939 (2.3953)	loss 2.2987 (2.9528)	grad_norm 0.6189 (0.6454/0.0433)	mem 48464MB
[2023-11-10 01:12:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][570/1251]	eta 0:27:13 lr 0.171689	time 2.3929 (2.3982)	model_time 2.3923 (2.3952)	loss 2.9082 (2.9519)	grad_norm 0.5951 (0.6448/0.0431)	mem 48464MB
[2023-11-10 01:12:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][580/1251]	eta 0:26:49 lr 0.171514	time 2.3926 (2.3982)	model_time 2.3922 (2.3952)	loss 1.7369 (2.9444)	grad_norm 0.6765 (0.6437/0.0433)	mem 48464MB
[2023-11-10 01:12:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][590/1251]	eta 0:26:25 lr 0.171338	time 2.3949 (2.3981)	model_time 2.3945 (2.3952)	loss 3.2354 (2.9445)	grad_norm 0.6472 (0.6437/0.0433)	mem 48464MB
[2023-11-10 01:13:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][600/1251]	eta 0:26:01 lr 0.171162	time 2.3940 (2.3980)	model_time 2.3937 (2.3952)	loss 2.6942 (2.9441)	grad_norm 0.6105 (0.6423/0.0425)	mem 48464MB
[2023-11-10 01:13:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][610/1251]	eta 0:25:37 lr 0.170985	time 2.3985 (2.3980)	model_time 2.3980 (2.3951)	loss 2.0666 (2.9428)	grad_norm 0.6054 (0.6419/0.0423)	mem 48464MB
[2023-11-10 01:14:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][620/1251]	eta 0:25:13 lr 0.170808	time 2.3941 (2.3979)	model_time 2.3937 (2.3951)	loss 3.2970 (2.9458)	grad_norm 0.5932 (0.6406/0.0423)	mem 48464MB
[2023-11-10 01:14:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][630/1251]	eta 0:24:49 lr 0.170631	time 2.4012 (2.3979)	model_time 2.4003 (2.3951)	loss 3.4352 (2.9465)	grad_norm 0.5586 (0.6393/0.0427)	mem 48464MB
[2023-11-10 01:14:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][640/1251]	eta 0:24:25 lr 0.170453	time 2.3982 (2.3979)	model_time 2.3979 (2.3951)	loss 3.3573 (2.9394)	grad_norm 0.6073 (0.6376/0.0422)	mem 48464MB
[2023-11-10 01:15:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][650/1251]	eta 0:24:01 lr 0.170274	time 2.3929 (2.3978)	model_time 2.3925 (2.3952)	loss 3.2506 (2.9410)	grad_norm 0.6430 (0.6361/0.0420)	mem 48464MB
[2023-11-10 01:15:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][660/1251]	eta 0:23:37 lr 0.170095	time 2.3979 (2.3978)	model_time 2.3977 (2.3951)	loss 3.4533 (2.9390)	grad_norm 0.6071 (0.6354/0.0422)	mem 48464MB
[2023-11-10 01:16:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][670/1251]	eta 0:23:13 lr 0.169916	time 2.3942 (2.3977)	model_time 2.3939 (2.3951)	loss 3.3415 (2.9377)	grad_norm 0.5940 (0.6345/0.0422)	mem 48464MB
[2023-11-10 01:16:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][680/1251]	eta 0:22:49 lr 0.169736	time 2.3916 (2.3977)	model_time 2.3914 (2.3951)	loss 1.9820 (2.9371)	grad_norm 0.5444 (0.6330/0.0416)	mem 48464MB
[2023-11-10 01:16:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][690/1251]	eta 0:22:25 lr 0.169556	time 2.3939 (2.3976)	model_time 2.3936 (2.3950)	loss 3.5212 (2.9345)	grad_norm 0.6161 (0.6324/0.0416)	mem 48464MB
[2023-11-10 01:17:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][700/1251]	eta 0:22:01 lr 0.169375	time 2.3932 (2.3975)	model_time 2.3928 (2.3950)	loss 3.6144 (2.9357)	grad_norm 0.5667 (0.6312/0.0423)	mem 48464MB
[2023-11-10 01:17:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][710/1251]	eta 0:21:37 lr 0.169194	time 2.3899 (2.3975)	model_time 2.3897 (2.3950)	loss 3.3455 (2.9391)	grad_norm 0.6156 (0.6298/0.0420)	mem 48464MB
[2023-11-10 01:18:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][720/1251]	eta 0:21:13 lr 0.169013	time 2.3898 (2.3974)	model_time 2.3896 (2.3949)	loss 3.1219 (2.9397)	grad_norm 0.5979 (0.6285/0.0419)	mem 48464MB
[2023-11-10 01:18:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][730/1251]	eta 0:20:49 lr 0.168831	time 2.3941 (2.3974)	model_time 2.3938 (2.3949)	loss 2.9458 (2.9432)	grad_norm 0.5911 (0.6266/0.0415)	mem 48464MB
[2023-11-10 01:18:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][740/1251]	eta 0:20:25 lr 0.168649	time 2.3923 (2.3973)	model_time 2.3920 (2.3949)	loss 2.1558 (2.9460)	grad_norm 0.5615 (0.6253/0.0406)	mem 48464MB
[2023-11-10 01:19:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][750/1251]	eta 0:20:01 lr 0.168466	time 2.3949 (2.3973)	model_time 2.3946 (2.3949)	loss 1.8609 (2.9462)	grad_norm 0.5851 (0.6247/0.0408)	mem 48464MB
[2023-11-10 01:19:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][760/1251]	eta 0:19:37 lr 0.168282	time 2.3924 (2.3972)	model_time 2.3919 (2.3949)	loss 3.7342 (2.9468)	grad_norm 0.6724 (0.6244/0.0412)	mem 48464MB
[2023-11-10 01:20:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][770/1251]	eta 0:19:13 lr 0.168099	time 2.3920 (2.3972)	model_time 2.3915 (2.3949)	loss 2.9236 (2.9448)	grad_norm 0.6351 (0.6235/0.0414)	mem 48464MB
[2023-11-10 01:20:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][780/1251]	eta 0:18:49 lr 0.167915	time 2.3930 (2.3972)	model_time 2.3927 (2.3949)	loss 3.7263 (2.9441)	grad_norm 0.6870 (0.6234/0.0414)	mem 48464MB
[2023-11-10 01:20:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][790/1251]	eta 0:18:25 lr 0.167730	time 2.3989 (2.3972)	model_time 2.3985 (2.3949)	loss 3.1835 (2.9454)	grad_norm 0.5600 (0.6228/0.0409)	mem 48464MB
[2023-11-10 01:21:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][800/1251]	eta 0:18:01 lr 0.167545	time 2.3977 (2.3971)	model_time 2.3974 (2.3949)	loss 3.8600 (2.9479)	grad_norm 0.6712 (0.6217/0.0408)	mem 48464MB
[2023-11-10 01:21:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][810/1251]	eta 0:17:37 lr 0.167360	time 2.3905 (2.3972)	model_time 2.3901 (2.3950)	loss 2.5922 (2.9486)	grad_norm 0.5581 (0.6198/0.0410)	mem 48464MB
[2023-11-10 01:22:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][820/1251]	eta 0:17:13 lr 0.167174	time 2.3942 (2.3972)	model_time 2.3939 (2.3950)	loss 1.7806 (2.9499)	grad_norm 0.5870 (0.6185/0.0395)	mem 48464MB
[2023-11-10 01:22:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][830/1251]	eta 0:16:49 lr 0.166988	time 2.3966 (2.3972)	model_time 2.3962 (2.3950)	loss 2.8178 (2.9472)	grad_norm 0.6332 (0.6172/0.0395)	mem 48464MB
[2023-11-10 01:22:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][840/1251]	eta 0:16:25 lr 0.166801	time 2.3919 (2.3974)	model_time 2.3915 (2.3952)	loss 2.7449 (2.9435)	grad_norm 0.6108 (0.6152/0.0393)	mem 48464MB
[2023-11-10 01:23:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][850/1251]	eta 0:16:01 lr 0.166614	time 2.3959 (2.3974)	model_time 2.3948 (2.3952)	loss 3.6416 (2.9450)	grad_norm 0.5915 (0.6144/0.0396)	mem 48464MB
[2023-11-10 01:23:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][860/1251]	eta 0:15:37 lr 0.166426	time 2.3974 (2.3973)	model_time 2.3970 (2.3952)	loss 3.2235 (2.9445)	grad_norm 0.6205 (0.6134/0.0393)	mem 48464MB
[2023-11-10 01:24:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][870/1251]	eta 0:15:13 lr 0.166238	time 2.3983 (2.3973)	model_time 2.3979 (2.3951)	loss 2.3432 (2.9429)	grad_norm 0.5722 (0.6126/0.0395)	mem 48464MB
[2023-11-10 01:24:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][880/1251]	eta 0:14:49 lr 0.166050	time 2.3933 (2.3972)	model_time 2.3929 (2.3951)	loss 3.9877 (2.9446)	grad_norm 0.5767 (0.6121/0.0395)	mem 48464MB
[2023-11-10 01:24:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][890/1251]	eta 0:14:25 lr 0.165861	time 2.3939 (2.3972)	model_time 2.3937 (2.3951)	loss 2.7319 (2.9457)	grad_norm 0.5588 (0.6106/0.0396)	mem 48464MB
[2023-11-10 01:25:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][900/1251]	eta 0:14:01 lr 0.165672	time 2.3942 (2.3972)	model_time 2.3939 (2.3951)	loss 2.8295 (2.9428)	grad_norm 0.5885 (0.6097/0.0404)	mem 48464MB
[2023-11-10 01:25:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][910/1251]	eta 0:13:37 lr 0.165483	time 2.3922 (2.3972)	model_time 2.3916 (2.3951)	loss 2.6920 (2.9433)	grad_norm 0.5445 (0.6082/0.0406)	mem 48464MB
[2023-11-10 01:26:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][920/1251]	eta 0:13:13 lr 0.165293	time 2.3899 (2.3971)	model_time 2.3896 (2.3951)	loss 3.0491 (2.9443)	grad_norm 0.6003 (0.6074/0.0406)	mem 48464MB
[2023-11-10 01:26:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][930/1251]	eta 0:12:49 lr 0.165102	time 2.4020 (2.3971)	model_time 2.4015 (2.3951)	loss 3.0234 (2.9418)	grad_norm 0.5519 (0.6058/0.0404)	mem 48464MB
[2023-11-10 01:26:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][940/1251]	eta 0:12:25 lr 0.164911	time 2.3945 (2.3971)	model_time 2.3942 (2.3951)	loss 2.9781 (2.9398)	grad_norm 0.5818 (0.6054/0.0403)	mem 48464MB
[2023-11-10 01:27:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][950/1251]	eta 0:12:01 lr 0.164720	time 2.4008 (2.3970)	model_time 2.4004 (2.3950)	loss 3.2515 (2.9393)	grad_norm 0.5854 (0.6037/0.0402)	mem 48464MB
[2023-11-10 01:27:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][960/1251]	eta 0:11:37 lr 0.164529	time 2.3968 (2.3970)	model_time 2.3966 (2.3950)	loss 3.6868 (2.9408)	grad_norm 0.6421 (0.6029/0.0402)	mem 48464MB
[2023-11-10 01:28:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][970/1251]	eta 0:11:13 lr 0.164336	time 2.5129 (2.3971)	model_time 2.5124 (2.3952)	loss 2.7838 (2.9363)	grad_norm 0.5755 (0.6013/0.0405)	mem 48464MB
[2023-11-10 01:28:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][980/1251]	eta 0:10:49 lr 0.164144	time 2.3954 (2.3971)	model_time 2.3951 (2.3951)	loss 1.8821 (2.9358)	grad_norm 0.5761 (0.6005/0.0402)	mem 48464MB
[2023-11-10 01:28:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][990/1251]	eta 0:10:25 lr 0.163951	time 2.3949 (2.3971)	model_time 2.3945 (2.3951)	loss 3.4003 (2.9369)	grad_norm 0.5391 (0.5993/0.0404)	mem 48464MB
[2023-11-10 01:29:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1000/1251]	eta 0:10:01 lr 0.163758	time 2.3983 (2.3971)	model_time 2.3980 (2.3951)	loss 3.4817 (2.9353)	grad_norm 0.5554 (0.5982/0.0408)	mem 48464MB
[2023-11-10 01:29:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1010/1251]	eta 0:09:37 lr 0.163564	time 2.3946 (2.3972)	model_time 2.3942 (2.3953)	loss 3.8297 (2.9357)	grad_norm 0.6404 (0.5972/0.0401)	mem 48464MB
[2023-11-10 01:30:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1020/1251]	eta 0:09:13 lr 0.163370	time 2.3951 (2.3972)	model_time 2.3946 (2.3953)	loss 3.9212 (2.9333)	grad_norm 0.6493 (0.5958/0.0399)	mem 48464MB
[2023-11-10 01:30:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1030/1251]	eta 0:08:49 lr 0.163176	time 2.3992 (2.3972)	model_time 2.3988 (2.3953)	loss 3.5540 (2.9338)	grad_norm 0.5916 (0.5954/0.0404)	mem 48464MB
[2023-11-10 01:30:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1040/1251]	eta 0:08:25 lr 0.162981	time 2.3940 (2.3971)	model_time 2.3936 (2.3953)	loss 1.7730 (2.9316)	grad_norm 0.5149 (0.5939/0.0407)	mem 48464MB
[2023-11-10 01:31:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1050/1251]	eta 0:08:01 lr 0.162786	time 2.3937 (2.3971)	model_time 2.3935 (2.3952)	loss 3.3796 (2.9317)	grad_norm 0.5832 (0.5933/0.0406)	mem 48464MB
[2023-11-10 01:31:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1060/1251]	eta 0:07:37 lr 0.162590	time 2.3912 (2.3971)	model_time 2.3909 (2.3952)	loss 3.3723 (2.9346)	grad_norm 0.5599 (0.5929/0.0398)	mem 48464MB
[2023-11-10 01:32:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1070/1251]	eta 0:07:13 lr 0.162394	time 2.3937 (2.3970)	model_time 2.3933 (2.3952)	loss 3.0109 (2.9343)	grad_norm 0.5632 (0.5921/0.0399)	mem 48464MB
[2023-11-10 01:32:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1080/1251]	eta 0:06:49 lr 0.162197	time 2.3987 (2.3970)	model_time 2.3984 (2.3952)	loss 3.0172 (2.9354)	grad_norm 0.5794 (0.5910/0.0398)	mem 48464MB
[2023-11-10 01:32:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1090/1251]	eta 0:06:25 lr 0.162001	time 2.3934 (2.3970)	model_time 2.3927 (2.3952)	loss 2.1352 (2.9361)	grad_norm 0.5490 (0.5892/0.0396)	mem 48464MB
[2023-11-10 01:33:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1100/1251]	eta 0:06:01 lr 0.161803	time 2.3973 (2.3970)	model_time 2.3969 (2.3952)	loss 1.5864 (2.9342)	grad_norm 0.6261 (0.5880/0.0395)	mem 48464MB
[2023-11-10 01:33:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1110/1251]	eta 0:05:37 lr 0.161606	time 2.3929 (2.3970)	model_time 2.3927 (2.3952)	loss 2.9689 (2.9365)	grad_norm 0.5713 (0.5874/0.0392)	mem 48464MB
[2023-11-10 01:34:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1120/1251]	eta 0:05:14 lr 0.161408	time 2.3929 (2.3969)	model_time 2.3927 (2.3951)	loss 2.7002 (2.9368)	grad_norm 0.5016 (0.5862/0.0396)	mem 48464MB
[2023-11-10 01:34:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1130/1251]	eta 0:04:50 lr 0.161209	time 2.3908 (2.3969)	model_time 2.3905 (2.3951)	loss 2.9915 (2.9392)	grad_norm 0.5934 (0.5858/0.0397)	mem 48464MB
[2023-11-10 01:34:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1140/1251]	eta 0:04:26 lr 0.161011	time 2.3937 (2.3969)	model_time 2.3935 (2.3951)	loss 3.8134 (2.9388)	grad_norm 0.6177 (0.5857/0.0395)	mem 48464MB
[2023-11-10 01:35:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1150/1251]	eta 0:04:02 lr 0.160811	time 2.3934 (2.3969)	model_time 2.3932 (2.3951)	loss 2.9324 (2.9404)	grad_norm 0.6516 (0.5853/0.0394)	mem 48464MB
[2023-11-10 01:35:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1160/1251]	eta 0:03:38 lr 0.160612	time 2.3909 (2.3968)	model_time 2.3906 (2.3951)	loss 3.2523 (2.9395)	grad_norm 0.5714 (0.5836/0.0392)	mem 48464MB
[2023-11-10 01:36:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1170/1251]	eta 0:03:14 lr 0.160412	time 2.3929 (2.3968)	model_time 2.3926 (2.3951)	loss 2.5191 (2.9404)	grad_norm 0.6309 (0.5824/0.0391)	mem 48464MB
[2023-11-10 01:36:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1180/1251]	eta 0:02:50 lr 0.160212	time 2.3986 (2.3968)	model_time 2.3984 (2.3950)	loss 3.1896 (2.9408)	grad_norm 0.5782 (0.5808/0.0391)	mem 48464MB
[2023-11-10 01:36:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1190/1251]	eta 0:02:26 lr 0.160011	time 2.3899 (2.3967)	model_time 2.3897 (2.3950)	loss 2.3792 (2.9401)	grad_norm 0.5405 (0.5798/0.0387)	mem 48464MB
[2023-11-10 01:37:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1200/1251]	eta 0:02:02 lr 0.159810	time 2.3966 (2.3967)	model_time 2.3962 (2.3950)	loss 3.1396 (2.9390)	grad_norm 0.5371 (0.5789/0.0386)	mem 48464MB
[2023-11-10 01:37:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1210/1251]	eta 0:01:38 lr 0.159608	time 2.3930 (2.3967)	model_time 2.3926 (2.3950)	loss 3.6682 (2.9380)	grad_norm 0.5710 (0.5787/0.0392)	mem 48464MB
[2023-11-10 01:38:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1220/1251]	eta 0:01:14 lr 0.159407	time 2.3939 (2.3967)	model_time 2.3935 (2.3950)	loss 3.2057 (2.9399)	grad_norm 0.6764 (0.5780/0.0394)	mem 48464MB
[2023-11-10 01:38:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1230/1251]	eta 0:00:50 lr 0.159204	time 2.3917 (2.3966)	model_time 2.3913 (2.3950)	loss 2.6928 (2.9404)	grad_norm 0.5664 (0.5783/0.0393)	mem 48464MB
[2023-11-10 01:38:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1240/1251]	eta 0:00:26 lr 0.159002	time 2.3926 (2.3966)	model_time 2.3925 (2.3949)	loss 2.8942 (2.9406)	grad_norm 0.6159 (0.5768/0.0388)	mem 48464MB
[2023-11-10 01:39:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [2/10][1250/1251]	eta 0:00:02 lr 0.158799	time 2.3922 (2.3966)	model_time 2.3920 (2.3949)	loss 3.2313 (2.9384)	grad_norm 0.6148 (0.5772/0.0398)	mem 48464MB
[2023-11-10 01:39:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 2 training takes 0:49:58
[2023-11-10 01:39:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_2.pth saving......
[2023-11-10 01:41:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_2.pth saved !!!
[2023-11-10 01:41:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.782 (3.782)	Loss 0.6753 (0.6753)	Acc@1 87.793 (87.793)	Acc@5 98.438 (98.438)	Mem 48464MB
[2023-11-10 01:41:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.244 (2.376)	Loss 0.7642 (0.6651)	Acc@1 84.375 (86.914)	Acc@5 97.266 (98.074)	Mem 48464MB
[2023-11-10 01:42:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.249 (2.316)	Loss 0.5859 (0.6604)	Acc@1 89.062 (86.900)	Acc@5 98.926 (98.163)	Mem 48464MB
[2023-11-10 01:42:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.250 (2.295)	Loss 0.7285 (0.6667)	Acc@1 84.570 (86.712)	Acc@5 97.168 (98.148)	Mem 48464MB
[2023-11-10 01:42:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.250 (2.284)	Loss 0.7593 (0.6696)	Acc@1 83.984 (86.659)	Acc@5 96.973 (98.116)	Mem 48464MB
[2023-11-10 01:43:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:2] * Acc@1 86.714 Acc@5 98.112
[2023-11-10 01:43:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 86.7%
[2023-11-10 01:43:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-10 01:44:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-10 01:44:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 86.71%
[2023-11-10 01:44:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.813 (3.813)	Loss 0.5771 (0.5771)	Acc@1 87.402 (87.402)	Acc@5 98.828 (98.828)	Mem 48464MB
[2023-11-10 01:45:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.244 (2.379)	Loss 0.6807 (0.5816)	Acc@1 85.840 (88.006)	Acc@5 97.949 (98.375)	Mem 48464MB
[2023-11-10 01:45:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.249 (2.317)	Loss 0.5186 (0.5811)	Acc@1 89.453 (88.072)	Acc@5 98.730 (98.382)	Mem 48464MB
[2023-11-10 01:45:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.250 (2.296)	Loss 0.6431 (0.5867)	Acc@1 87.012 (87.922)	Acc@5 98.242 (98.393)	Mem 48464MB
[2023-11-10 01:46:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.248 (2.285)	Loss 0.6479 (0.5887)	Acc@1 86.133 (87.905)	Acc@5 97.461 (98.371)	Mem 48464MB
[2023-11-10 01:46:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:2] * Acc@1 87.960 Acc@5 98.368
[2023-11-10 01:46:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 88.0%
[2023-11-10 01:46:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saving......
[2023-11-10 01:48:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saved !!!
[2023-11-10 01:48:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 87.96%
[2023-11-10 01:48:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][0/1251]	eta 1:25:51 lr 0.158779	time 4.1179 (4.1179)	model_time 2.3961 (2.3961)	loss 2.7723 (2.7723)	grad_norm 0.5585 (0.5585/0.0000)	mem 48464MB
[2023-11-10 01:48:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][10/1251]	eta 0:52:24 lr 0.158575	time 2.3848 (2.5339)	model_time 2.3845 (2.3771)	loss 3.3341 (3.1189)	grad_norm 0.5492 (0.5680/0.0399)	mem 48464MB
[2023-11-10 01:49:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][20/1251]	eta 0:50:35 lr 0.158371	time 2.3896 (2.4663)	model_time 2.3893 (2.3839)	loss 2.2332 (2.9422)	grad_norm 0.5319 (0.5551/0.0405)	mem 48464MB
[2023-11-10 01:49:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][30/1251]	eta 0:49:42 lr 0.158167	time 2.3906 (2.4427)	model_time 2.3902 (2.3867)	loss 3.0839 (2.9125)	grad_norm 0.5832 (0.5571/0.0360)	mem 48464MB
[2023-11-10 01:50:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][40/1251]	eta 0:49:09 lr 0.157963	time 2.5990 (2.4353)	model_time 2.5985 (2.3929)	loss 2.9835 (2.9390)	grad_norm 0.5748 (0.5577/0.0343)	mem 48464MB
[2023-11-10 01:50:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][50/1251]	eta 0:48:34 lr 0.157758	time 2.3930 (2.4268)	model_time 2.3927 (2.3926)	loss 2.5849 (2.9333)	grad_norm 0.5526 (0.5583/0.0347)	mem 48464MB
[2023-11-10 01:50:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][60/1251]	eta 0:48:03 lr 0.157553	time 2.3949 (2.4211)	model_time 2.3947 (2.3924)	loss 1.9493 (2.8767)	grad_norm 0.6175 (0.5582/0.0346)	mem 48464MB
[2023-11-10 01:51:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][70/1251]	eta 0:47:34 lr 0.157347	time 2.3928 (2.4171)	model_time 2.3925 (2.3924)	loss 3.1513 (2.8873)	grad_norm 0.6410 (0.5604/0.0373)	mem 48464MB
[2023-11-10 01:51:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][80/1251]	eta 0:47:07 lr 0.157141	time 2.3987 (2.4145)	model_time 2.3982 (2.3927)	loss 2.8797 (2.9024)	grad_norm 0.5392 (0.5612/0.0376)	mem 48464MB
[2023-11-10 01:52:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][90/1251]	eta 0:46:40 lr 0.156935	time 2.3939 (2.4123)	model_time 2.3935 (2.3930)	loss 2.7424 (2.8690)	grad_norm 0.5571 (0.5596/0.0371)	mem 48464MB
[2023-11-10 01:52:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][100/1251]	eta 0:46:14 lr 0.156729	time 2.3909 (2.4105)	model_time 2.3905 (2.3930)	loss 2.5735 (2.8933)	grad_norm 0.5555 (0.5603/0.0392)	mem 48464MB
[2023-11-10 01:52:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][110/1251]	eta 0:45:48 lr 0.156522	time 2.3944 (2.4089)	model_time 2.3937 (2.3929)	loss 3.0338 (2.8671)	grad_norm 0.5080 (0.5581/0.0392)	mem 48464MB
[2023-11-10 01:53:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][120/1251]	eta 0:45:23 lr 0.156314	time 2.3929 (2.4077)	model_time 2.3923 (2.3930)	loss 3.3313 (2.8579)	grad_norm 0.5826 (0.5569/0.0392)	mem 48464MB
[2023-11-10 01:53:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][130/1251]	eta 0:44:57 lr 0.156107	time 2.3931 (2.4068)	model_time 2.3927 (2.3931)	loss 3.1726 (2.8425)	grad_norm 0.5104 (0.5548/0.0390)	mem 48464MB
[2023-11-10 01:54:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][140/1251]	eta 0:44:32 lr 0.155898	time 2.3955 (2.4059)	model_time 2.3949 (2.3932)	loss 3.7078 (2.8476)	grad_norm 0.6229 (0.5547/0.0389)	mem 48464MB
[2023-11-10 01:54:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][150/1251]	eta 0:44:08 lr 0.155690	time 2.3952 (2.4052)	model_time 2.3948 (2.3933)	loss 2.9556 (2.8449)	grad_norm 0.5220 (0.5549/0.0388)	mem 48464MB
[2023-11-10 01:54:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][160/1251]	eta 0:43:43 lr 0.155481	time 2.3928 (2.4045)	model_time 2.3925 (2.3934)	loss 2.9468 (2.8519)	grad_norm 0.5245 (0.5547/0.0385)	mem 48464MB
[2023-11-10 01:55:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][170/1251]	eta 0:43:18 lr 0.155272	time 2.3947 (2.4040)	model_time 2.3943 (2.3934)	loss 3.3721 (2.8591)	grad_norm 0.5270 (0.5547/0.0393)	mem 48464MB
[2023-11-10 01:55:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][180/1251]	eta 0:42:54 lr 0.155063	time 2.3931 (2.4034)	model_time 2.3923 (2.3934)	loss 3.7200 (2.8542)	grad_norm 0.6495 (0.5539/0.0399)	mem 48464MB
[2023-11-10 01:56:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][190/1251]	eta 0:42:29 lr 0.154853	time 2.3918 (2.4030)	model_time 2.3914 (2.3935)	loss 3.3878 (2.8453)	grad_norm 0.5143 (0.5534/0.0399)	mem 48464MB
[2023-11-10 01:56:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][200/1251]	eta 0:42:05 lr 0.154643	time 2.3907 (2.4026)	model_time 2.3903 (2.3935)	loss 3.6829 (2.8428)	grad_norm 0.5821 (0.5526/0.0396)	mem 48464MB
[2023-11-10 01:56:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][210/1251]	eta 0:41:40 lr 0.154432	time 2.3955 (2.4023)	model_time 2.3951 (2.3936)	loss 3.7415 (2.8511)	grad_norm 0.5416 (0.5530/0.0397)	mem 48464MB
[2023-11-10 01:57:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][220/1251]	eta 0:41:16 lr 0.154221	time 2.3891 (2.4019)	model_time 2.3887 (2.3936)	loss 2.0556 (2.8592)	grad_norm 0.5697 (0.5530/0.0394)	mem 48464MB
[2023-11-10 01:57:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][230/1251]	eta 0:40:52 lr 0.154010	time 2.3922 (2.4016)	model_time 2.3918 (2.3936)	loss 3.0067 (2.8628)	grad_norm 0.5222 (0.5527/0.0390)	mem 48464MB
[2023-11-10 01:58:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][240/1251]	eta 0:40:27 lr 0.153799	time 2.3928 (2.4013)	model_time 2.3923 (2.3936)	loss 3.6135 (2.8669)	grad_norm 0.5683 (0.5524/0.0387)	mem 48464MB
[2023-11-10 01:58:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][250/1251]	eta 0:40:03 lr 0.153587	time 2.3952 (2.4009)	model_time 2.3950 (2.3936)	loss 3.7587 (2.8637)	grad_norm 0.5545 (0.5522/0.0382)	mem 48464MB
[2023-11-10 01:58:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][260/1251]	eta 0:39:39 lr 0.153375	time 2.3943 (2.4013)	model_time 2.3938 (2.3942)	loss 3.0600 (2.8589)	grad_norm 0.6511 (0.5520/0.0388)	mem 48464MB
[2023-11-10 01:59:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][270/1251]	eta 0:39:15 lr 0.153162	time 2.3947 (2.4011)	model_time 2.3943 (2.3943)	loss 2.4033 (2.8586)	grad_norm 0.5011 (0.5515/0.0387)	mem 48464MB
[2023-11-10 01:59:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][280/1251]	eta 0:38:51 lr 0.152949	time 2.3975 (2.4009)	model_time 2.3971 (2.3943)	loss 2.4511 (2.8552)	grad_norm 0.5435 (0.5514/0.0388)	mem 48464MB
[2023-11-10 02:00:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][290/1251]	eta 0:38:27 lr 0.152736	time 2.3965 (2.4012)	model_time 2.3959 (2.3948)	loss 3.2691 (2.8707)	grad_norm 0.5372 (0.5512/0.0388)	mem 48464MB
[2023-11-10 02:00:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][300/1251]	eta 0:38:03 lr 0.152523	time 2.3963 (2.4011)	model_time 2.3958 (2.3949)	loss 2.4234 (2.8768)	grad_norm 0.5301 (0.5512/0.0387)	mem 48464MB
[2023-11-10 02:00:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][310/1251]	eta 0:37:39 lr 0.152309	time 2.3943 (2.4008)	model_time 2.3939 (2.3948)	loss 2.2271 (2.8831)	grad_norm 0.5624 (0.5510/0.0382)	mem 48464MB
[2023-11-10 02:01:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][320/1251]	eta 0:37:15 lr 0.152095	time 2.3939 (2.4007)	model_time 2.3936 (2.3948)	loss 2.4532 (2.8689)	grad_norm 0.5058 (0.5510/0.0379)	mem 48464MB
[2023-11-10 02:01:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][330/1251]	eta 0:36:50 lr 0.151880	time 2.3939 (2.4005)	model_time 2.3935 (2.3948)	loss 1.9422 (2.8581)	grad_norm 0.5349 (0.5503/0.0382)	mem 48464MB
[2023-11-10 02:02:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][340/1251]	eta 0:36:26 lr 0.151665	time 2.3907 (2.4003)	model_time 2.3902 (2.3948)	loss 3.1165 (2.8581)	grad_norm 0.5738 (0.5500/0.0381)	mem 48464MB
[2023-11-10 02:02:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][350/1251]	eta 0:36:02 lr 0.151450	time 2.3942 (2.4001)	model_time 2.3939 (2.3948)	loss 3.5188 (2.8508)	grad_norm 0.4997 (0.5491/0.0384)	mem 48464MB
[2023-11-10 02:02:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][360/1251]	eta 0:35:38 lr 0.151234	time 2.3975 (2.4000)	model_time 2.3972 (2.3947)	loss 2.9625 (2.8529)	grad_norm 0.5734 (0.5485/0.0384)	mem 48464MB
[2023-11-10 02:03:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][370/1251]	eta 0:35:14 lr 0.151019	time 2.3922 (2.3998)	model_time 2.3917 (2.3947)	loss 3.0798 (2.8512)	grad_norm 0.6275 (0.5479/0.0377)	mem 48464MB
[2023-11-10 02:03:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][380/1251]	eta 0:34:50 lr 0.150803	time 2.3942 (2.3996)	model_time 2.3938 (2.3946)	loss 1.8349 (2.8487)	grad_norm 0.5036 (0.5469/0.0374)	mem 48464MB
[2023-11-10 02:04:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][390/1251]	eta 0:34:25 lr 0.150586	time 2.3966 (2.3995)	model_time 2.3963 (2.3946)	loss 1.7155 (2.8496)	grad_norm 0.6044 (0.5473/0.0379)	mem 48464MB
[2023-11-10 02:04:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][400/1251]	eta 0:34:01 lr 0.150369	time 2.3960 (2.3994)	model_time 2.3956 (2.3946)	loss 2.2088 (2.8497)	grad_norm 0.5531 (0.5465/0.0366)	mem 48464MB
[2023-11-10 02:04:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][410/1251]	eta 0:33:37 lr 0.150152	time 2.3980 (2.3993)	model_time 2.3976 (2.3946)	loss 3.5435 (2.8488)	grad_norm 0.5369 (0.5475/0.0364)	mem 48464MB
[2023-11-10 02:05:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][420/1251]	eta 0:33:13 lr 0.149935	time 2.3936 (2.3992)	model_time 2.3933 (2.3947)	loss 3.6818 (2.8582)	grad_norm 0.5825 (0.5481/0.0365)	mem 48464MB
[2023-11-10 02:05:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][430/1251]	eta 0:32:49 lr 0.149717	time 2.3917 (2.3991)	model_time 2.3910 (2.3946)	loss 3.0542 (2.8595)	grad_norm 0.4993 (0.5481/0.0365)	mem 48464MB
[2023-11-10 02:06:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][440/1251]	eta 0:32:25 lr 0.149499	time 2.3942 (2.3990)	model_time 2.3939 (2.3946)	loss 2.3421 (2.8516)	grad_norm 0.5892 (0.5472/0.0366)	mem 48464MB
[2023-11-10 02:06:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][450/1251]	eta 0:32:01 lr 0.149281	time 2.3935 (2.3989)	model_time 2.3931 (2.3946)	loss 2.5258 (2.8525)	grad_norm 0.5147 (0.5467/0.0362)	mem 48464MB
[2023-11-10 02:06:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][460/1251]	eta 0:31:37 lr 0.149062	time 2.3925 (2.3993)	model_time 2.3919 (2.3951)	loss 3.2536 (2.8554)	grad_norm 0.6217 (0.5463/0.0364)	mem 48464MB
[2023-11-10 02:07:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][470/1251]	eta 0:31:13 lr 0.148843	time 2.3913 (2.3992)	model_time 2.3907 (2.3951)	loss 2.1178 (2.8558)	grad_norm 0.5342 (0.5457/0.0356)	mem 48464MB
[2023-11-10 02:07:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][480/1251]	eta 0:30:49 lr 0.148624	time 2.3944 (2.3990)	model_time 2.3940 (2.3950)	loss 2.9789 (2.8520)	grad_norm 0.5000 (0.5457/0.0348)	mem 48464MB
[2023-11-10 02:08:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][490/1251]	eta 0:30:25 lr 0.148404	time 2.3993 (2.3990)	model_time 2.3988 (2.3950)	loss 3.2053 (2.8538)	grad_norm 0.5239 (0.5458/0.0343)	mem 48464MB
[2023-11-10 02:08:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][500/1251]	eta 0:30:01 lr 0.148184	time 2.3936 (2.3989)	model_time 2.3930 (2.3950)	loss 3.1333 (2.8544)	grad_norm 0.5704 (0.5453/0.0350)	mem 48464MB
[2023-11-10 02:08:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][510/1251]	eta 0:29:37 lr 0.147964	time 2.3917 (2.3988)	model_time 2.3913 (2.3949)	loss 1.7333 (2.8505)	grad_norm 0.6197 (0.5449/0.0349)	mem 48464MB
[2023-11-10 02:09:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][520/1251]	eta 0:29:13 lr 0.147743	time 2.3943 (2.3987)	model_time 2.3940 (2.3949)	loss 3.3196 (2.8517)	grad_norm 0.6058 (0.5449/0.0348)	mem 48464MB
[2023-11-10 02:09:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][530/1251]	eta 0:28:49 lr 0.147523	time 2.3922 (2.3986)	model_time 2.3916 (2.3949)	loss 1.8742 (2.8482)	grad_norm 0.5537 (0.5447/0.0348)	mem 48464MB
[2023-11-10 02:10:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][540/1251]	eta 0:28:25 lr 0.147302	time 2.4001 (2.3986)	model_time 2.3995 (2.3949)	loss 3.2235 (2.8500)	grad_norm 0.5587 (0.5446/0.0349)	mem 48464MB
[2023-11-10 02:10:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][550/1251]	eta 0:28:01 lr 0.147080	time 2.3925 (2.3985)	model_time 2.3921 (2.3949)	loss 2.9505 (2.8447)	grad_norm 0.5107 (0.5439/0.0353)	mem 48464MB
[2023-11-10 02:10:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][560/1251]	eta 0:27:37 lr 0.146858	time 2.3944 (2.3984)	model_time 2.3940 (2.3949)	loss 1.6106 (2.8465)	grad_norm 0.5934 (0.5439/0.0346)	mem 48464MB
[2023-11-10 02:11:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][570/1251]	eta 0:27:13 lr 0.146636	time 2.3944 (2.3984)	model_time 2.3940 (2.3949)	loss 2.8095 (2.8493)	grad_norm 0.5428 (0.5442/0.0349)	mem 48464MB
[2023-11-10 02:11:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][580/1251]	eta 0:26:49 lr 0.146414	time 2.3939 (2.3983)	model_time 2.3937 (2.3949)	loss 1.8690 (2.8464)	grad_norm 0.5255 (0.5434/0.0347)	mem 48464MB
[2023-11-10 02:12:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][590/1251]	eta 0:26:25 lr 0.146192	time 2.3971 (2.3983)	model_time 2.3967 (2.3949)	loss 1.9687 (2.8429)	grad_norm 0.5021 (0.5431/0.0346)	mem 48464MB
[2023-11-10 02:12:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][600/1251]	eta 0:26:01 lr 0.145969	time 2.3924 (2.3982)	model_time 2.3920 (2.3949)	loss 2.7954 (2.8415)	grad_norm 0.4966 (0.5421/0.0344)	mem 48464MB
[2023-11-10 02:12:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][610/1251]	eta 0:25:37 lr 0.145746	time 2.3922 (2.3981)	model_time 2.3917 (2.3948)	loss 1.5987 (2.8400)	grad_norm 0.4932 (0.5412/0.0345)	mem 48464MB
[2023-11-10 02:13:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][620/1251]	eta 0:25:13 lr 0.145522	time 2.3929 (2.3981)	model_time 2.3926 (2.3948)	loss 2.8542 (2.8444)	grad_norm 0.5479 (0.5413/0.0352)	mem 48464MB
[2023-11-10 02:13:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][630/1251]	eta 0:24:49 lr 0.145298	time 2.3921 (2.3980)	model_time 2.3918 (2.3948)	loss 2.6420 (2.8402)	grad_norm 0.5071 (0.5415/0.0353)	mem 48464MB
[2023-11-10 02:14:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][640/1251]	eta 0:24:25 lr 0.145074	time 2.3909 (2.3979)	model_time 2.3904 (2.3948)	loss 2.2113 (2.8413)	grad_norm 0.5393 (0.5412/0.0355)	mem 48464MB
[2023-11-10 02:14:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][650/1251]	eta 0:24:01 lr 0.144850	time 2.3947 (2.3979)	model_time 2.3943 (2.3947)	loss 2.9478 (2.8450)	grad_norm 0.5135 (0.5417/0.0353)	mem 48464MB
[2023-11-10 02:14:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][660/1251]	eta 0:23:37 lr 0.144625	time 2.3900 (2.3978)	model_time 2.3894 (2.3947)	loss 2.7245 (2.8449)	grad_norm 0.5263 (0.5418/0.0352)	mem 48464MB
[2023-11-10 02:15:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][670/1251]	eta 0:23:13 lr 0.144401	time 2.3977 (2.3978)	model_time 2.3973 (2.3947)	loss 2.1087 (2.8431)	grad_norm 0.5098 (0.5408/0.0346)	mem 48464MB
[2023-11-10 02:15:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][680/1251]	eta 0:22:49 lr 0.144175	time 2.3938 (2.3978)	model_time 2.3934 (2.3947)	loss 3.6386 (2.8425)	grad_norm 0.4865 (0.5406/0.0347)	mem 48464MB
[2023-11-10 02:16:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][690/1251]	eta 0:22:25 lr 0.143950	time 2.3883 (2.3977)	model_time 2.3880 (2.3947)	loss 3.0101 (2.8435)	grad_norm 0.5225 (0.5403/0.0342)	mem 48464MB
[2023-11-10 02:16:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][700/1251]	eta 0:22:01 lr 0.143724	time 2.3923 (2.3976)	model_time 2.3919 (2.3947)	loss 2.1497 (2.8426)	grad_norm 0.5326 (0.5400/0.0342)	mem 48464MB
[2023-11-10 02:16:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][710/1251]	eta 0:21:37 lr 0.143498	time 2.3938 (2.3976)	model_time 2.3935 (2.3947)	loss 3.6297 (2.8418)	grad_norm 0.6493 (0.5389/0.0348)	mem 48464MB
[2023-11-10 02:17:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][720/1251]	eta 0:21:13 lr 0.143272	time 2.3915 (2.3975)	model_time 2.3911 (2.3947)	loss 3.8001 (2.8492)	grad_norm 0.5672 (0.5387/0.0344)	mem 48464MB
[2023-11-10 02:17:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][730/1251]	eta 0:20:49 lr 0.143045	time 2.3948 (2.3975)	model_time 2.3945 (2.3946)	loss 3.4965 (2.8516)	grad_norm 0.5302 (0.5384/0.0345)	mem 48464MB
[2023-11-10 02:17:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][740/1251]	eta 0:20:25 lr 0.142819	time 2.3921 (2.3974)	model_time 2.3916 (2.3946)	loss 3.9562 (2.8476)	grad_norm 0.5731 (0.5385/0.0346)	mem 48464MB
[2023-11-10 02:18:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][750/1251]	eta 0:20:01 lr 0.142592	time 2.3953 (2.3974)	model_time 2.3949 (2.3946)	loss 1.9333 (2.8472)	grad_norm 0.4928 (0.5381/0.0345)	mem 48464MB
[2023-11-10 02:18:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][760/1251]	eta 0:19:37 lr 0.142364	time 2.3925 (2.3975)	model_time 2.3920 (2.3948)	loss 2.3651 (2.8475)	grad_norm 0.5037 (0.5381/0.0344)	mem 48464MB
[2023-11-10 02:19:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][770/1251]	eta 0:19:13 lr 0.142137	time 2.3924 (2.3975)	model_time 2.3921 (2.3948)	loss 3.1023 (2.8473)	grad_norm 0.5753 (0.5378/0.0343)	mem 48464MB
[2023-11-10 02:19:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][780/1251]	eta 0:18:49 lr 0.141909	time 2.3988 (2.3975)	model_time 2.3985 (2.3948)	loss 2.9406 (2.8475)	grad_norm 0.5285 (0.5372/0.0343)	mem 48464MB
[2023-11-10 02:19:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][790/1251]	eta 0:18:25 lr 0.141681	time 2.3981 (2.3975)	model_time 2.3977 (2.3948)	loss 3.0494 (2.8474)	grad_norm 0.5532 (0.5375/0.0344)	mem 48464MB
[2023-11-10 02:20:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][800/1251]	eta 0:18:01 lr 0.141452	time 2.3925 (2.3977)	model_time 2.3921 (2.3950)	loss 2.7819 (2.8476)	grad_norm 0.4995 (0.5378/0.0341)	mem 48464MB
[2023-11-10 02:20:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][810/1251]	eta 0:17:37 lr 0.141224	time 2.3969 (2.3976)	model_time 2.3966 (2.3950)	loss 3.9031 (2.8515)	grad_norm 0.5990 (0.5372/0.0340)	mem 48464MB
[2023-11-10 02:21:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][820/1251]	eta 0:17:13 lr 0.140995	time 2.3956 (2.3976)	model_time 2.3952 (2.3950)	loss 3.0356 (2.8504)	grad_norm 0.5095 (0.5363/0.0342)	mem 48464MB
[2023-11-10 02:21:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][830/1251]	eta 0:16:49 lr 0.140765	time 2.3933 (2.3976)	model_time 2.3931 (2.3950)	loss 2.3583 (2.8520)	grad_norm 0.5584 (0.5361/0.0342)	mem 48464MB
[2023-11-10 02:21:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][840/1251]	eta 0:16:25 lr 0.140536	time 2.3935 (2.3976)	model_time 2.3930 (2.3950)	loss 2.7970 (2.8544)	grad_norm 0.5123 (0.5356/0.0342)	mem 48464MB
[2023-11-10 02:22:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][850/1251]	eta 0:16:01 lr 0.140306	time 2.3942 (2.3975)	model_time 2.3938 (2.3950)	loss 2.9278 (2.8553)	grad_norm 0.5490 (0.5356/0.0342)	mem 48464MB
[2023-11-10 02:22:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][860/1251]	eta 0:15:37 lr 0.140076	time 2.3924 (2.3974)	model_time 2.3921 (2.3950)	loss 3.8155 (2.8540)	grad_norm 0.5096 (0.5350/0.0339)	mem 48464MB
[2023-11-10 02:23:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][870/1251]	eta 0:15:13 lr 0.139846	time 2.3909 (2.3974)	model_time 2.3906 (2.3949)	loss 2.9893 (2.8563)	grad_norm 0.5275 (0.5345/0.0336)	mem 48464MB
[2023-11-10 02:23:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][880/1251]	eta 0:14:49 lr 0.139616	time 2.3929 (2.3974)	model_time 2.3926 (2.3949)	loss 3.3151 (2.8558)	grad_norm 0.5121 (0.5348/0.0343)	mem 48464MB
[2023-11-10 02:23:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][890/1251]	eta 0:14:25 lr 0.139385	time 2.3965 (2.3973)	model_time 2.3960 (2.3949)	loss 3.1270 (2.8587)	grad_norm 0.5507 (0.5341/0.0343)	mem 48464MB
[2023-11-10 02:24:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][900/1251]	eta 0:14:01 lr 0.139154	time 2.3840 (2.3973)	model_time 2.3835 (2.3949)	loss 3.4516 (2.8594)	grad_norm 0.5286 (0.5338/0.0348)	mem 48464MB
[2023-11-10 02:24:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][910/1251]	eta 0:13:37 lr 0.138923	time 2.3962 (2.3972)	model_time 2.3957 (2.3948)	loss 1.9207 (2.8614)	grad_norm 0.5423 (0.5335/0.0347)	mem 48464MB
[2023-11-10 02:25:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][920/1251]	eta 0:13:13 lr 0.138691	time 2.3911 (2.3972)	model_time 2.3907 (2.3948)	loss 3.0065 (2.8642)	grad_norm 0.5902 (0.5332/0.0344)	mem 48464MB
[2023-11-10 02:25:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][930/1251]	eta 0:12:49 lr 0.138460	time 2.3943 (2.3974)	model_time 2.3939 (2.3950)	loss 3.6207 (2.8656)	grad_norm 0.5371 (0.5324/0.0344)	mem 48464MB
[2023-11-10 02:25:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][940/1251]	eta 0:12:25 lr 0.138228	time 2.3957 (2.3973)	model_time 2.3953 (2.3950)	loss 2.5845 (2.8670)	grad_norm 0.4923 (0.5316/0.0342)	mem 48464MB
[2023-11-10 02:26:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][950/1251]	eta 0:12:01 lr 0.137996	time 2.3944 (2.3973)	model_time 2.3941 (2.3950)	loss 3.0614 (2.8700)	grad_norm 0.5274 (0.5310/0.0339)	mem 48464MB
[2023-11-10 02:26:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][960/1251]	eta 0:11:37 lr 0.137763	time 2.5280 (2.3974)	model_time 2.5276 (2.3951)	loss 3.0603 (2.8693)	grad_norm 0.5250 (0.5303/0.0342)	mem 48464MB
[2023-11-10 02:27:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][970/1251]	eta 0:11:13 lr 0.137531	time 2.3948 (2.3974)	model_time 2.3944 (2.3951)	loss 2.4961 (2.8698)	grad_norm 0.5018 (0.5301/0.0348)	mem 48464MB
[2023-11-10 02:27:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][980/1251]	eta 0:10:49 lr 0.137298	time 2.3954 (2.3974)	model_time 2.3951 (2.3951)	loss 2.7976 (2.8689)	grad_norm 0.5258 (0.5298/0.0349)	mem 48464MB
[2023-11-10 02:27:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][990/1251]	eta 0:10:25 lr 0.137064	time 2.3951 (2.3973)	model_time 2.3947 (2.3951)	loss 2.0364 (2.8698)	grad_norm 0.4943 (0.5291/0.0344)	mem 48464MB
[2023-11-10 02:28:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1000/1251]	eta 0:10:01 lr 0.136831	time 2.3914 (2.3973)	model_time 2.3910 (2.3951)	loss 3.1742 (2.8697)	grad_norm 0.4997 (0.5285/0.0346)	mem 48464MB
[2023-11-10 02:28:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1010/1251]	eta 0:09:37 lr 0.136598	time 2.3942 (2.3973)	model_time 2.3939 (2.3951)	loss 3.5787 (2.8700)	grad_norm 0.5617 (0.5291/0.0348)	mem 48464MB
[2023-11-10 02:29:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1020/1251]	eta 0:09:13 lr 0.136364	time 2.3932 (2.3972)	model_time 2.3930 (2.3951)	loss 2.8857 (2.8670)	grad_norm 0.4781 (0.5277/0.0347)	mem 48464MB
[2023-11-10 02:29:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1030/1251]	eta 0:08:49 lr 0.136130	time 2.3972 (2.3972)	model_time 2.3969 (2.3950)	loss 2.9962 (2.8667)	grad_norm 0.4821 (0.5273/0.0348)	mem 48464MB
[2023-11-10 02:29:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1040/1251]	eta 0:08:25 lr 0.135895	time 2.3976 (2.3971)	model_time 2.3972 (2.3950)	loss 3.4421 (2.8668)	grad_norm 0.5020 (0.5267/0.0344)	mem 48464MB
[2023-11-10 02:30:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1050/1251]	eta 0:08:01 lr 0.135661	time 2.3981 (2.3971)	model_time 2.3975 (2.3950)	loss 3.0854 (2.8680)	grad_norm 0.5330 (0.5263/0.0347)	mem 48464MB
[2023-11-10 02:30:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1060/1251]	eta 0:07:37 lr 0.135426	time 2.3906 (2.3971)	model_time 2.3902 (2.3950)	loss 2.6853 (2.8680)	grad_norm 0.4986 (0.5250/0.0345)	mem 48464MB
[2023-11-10 02:31:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1070/1251]	eta 0:07:13 lr 0.135191	time 2.3968 (2.3971)	model_time 2.3965 (2.3950)	loss 2.5946 (2.8657)	grad_norm 0.4646 (0.5242/0.0352)	mem 48464MB
[2023-11-10 02:31:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1080/1251]	eta 0:06:49 lr 0.134956	time 2.3965 (2.3971)	model_time 2.3960 (2.3950)	loss 3.1258 (2.8675)	grad_norm 0.5286 (0.5247/0.0355)	mem 48464MB
[2023-11-10 02:31:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1090/1251]	eta 0:06:25 lr 0.134721	time 2.3916 (2.3970)	model_time 2.3911 (2.3950)	loss 3.0923 (2.8681)	grad_norm 0.5187 (0.5234/0.0357)	mem 48464MB
[2023-11-10 02:32:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1100/1251]	eta 0:06:01 lr 0.134485	time 2.3961 (2.3970)	model_time 2.3956 (2.3950)	loss 3.6890 (2.8660)	grad_norm 0.5559 (0.5230/0.0355)	mem 48464MB
[2023-11-10 02:32:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1110/1251]	eta 0:05:37 lr 0.134249	time 2.3932 (2.3970)	model_time 2.3928 (2.3949)	loss 2.6608 (2.8623)	grad_norm 0.4760 (0.5222/0.0355)	mem 48464MB
[2023-11-10 02:33:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1120/1251]	eta 0:05:13 lr 0.134013	time 2.3974 (2.3969)	model_time 2.3971 (2.3949)	loss 3.6468 (2.8610)	grad_norm 0.6072 (0.5218/0.0354)	mem 48464MB
[2023-11-10 02:33:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1130/1251]	eta 0:04:50 lr 0.133777	time 2.3933 (2.3969)	model_time 2.3928 (2.3949)	loss 1.8012 (2.8602)	grad_norm 0.5041 (0.5213/0.0355)	mem 48464MB
[2023-11-10 02:33:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1140/1251]	eta 0:04:26 lr 0.133540	time 2.3974 (2.3969)	model_time 2.3972 (2.3949)	loss 2.5275 (2.8603)	grad_norm 0.5142 (0.5209/0.0353)	mem 48464MB
[2023-11-10 02:34:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1150/1251]	eta 0:04:02 lr 0.133304	time 2.4006 (2.3969)	model_time 2.4001 (2.3949)	loss 2.5326 (2.8586)	grad_norm 0.4637 (0.5198/0.0357)	mem 48464MB
[2023-11-10 02:34:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1160/1251]	eta 0:03:38 lr 0.133067	time 2.3984 (2.3969)	model_time 2.3979 (2.3949)	loss 2.4597 (2.8577)	grad_norm 0.4603 (0.5189/0.0362)	mem 48464MB
[2023-11-10 02:35:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1170/1251]	eta 0:03:14 lr 0.132830	time 2.3905 (2.3969)	model_time 2.3902 (2.3949)	loss 2.4343 (2.8582)	grad_norm 0.5180 (0.5189/0.0363)	mem 48464MB
[2023-11-10 02:35:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1180/1251]	eta 0:02:50 lr 0.132592	time 2.3916 (2.3969)	model_time 2.3913 (2.3949)	loss 3.2041 (2.8593)	grad_norm 0.5424 (0.5180/0.0354)	mem 48464MB
[2023-11-10 02:35:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1190/1251]	eta 0:02:26 lr 0.132355	time 2.3955 (2.3968)	model_time 2.3951 (2.3949)	loss 3.0554 (2.8612)	grad_norm 0.4701 (0.5184/0.0360)	mem 48464MB
[2023-11-10 02:36:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1200/1251]	eta 0:02:02 lr 0.132117	time 2.3972 (2.3968)	model_time 2.3969 (2.3949)	loss 3.4001 (2.8626)	grad_norm 0.5352 (0.5190/0.0362)	mem 48464MB
[2023-11-10 02:36:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1210/1251]	eta 0:01:38 lr 0.131879	time 2.3920 (2.3968)	model_time 2.3917 (2.3949)	loss 3.1211 (2.8605)	grad_norm 0.5874 (0.5180/0.0366)	mem 48464MB
[2023-11-10 02:37:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1220/1251]	eta 0:01:14 lr 0.131641	time 2.3937 (2.3968)	model_time 2.3934 (2.3949)	loss 3.2872 (2.8617)	grad_norm 0.5166 (0.5171/0.0359)	mem 48464MB
[2023-11-10 02:37:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1230/1251]	eta 0:00:50 lr 0.131403	time 2.3914 (2.3968)	model_time 2.3910 (2.3949)	loss 3.1349 (2.8643)	grad_norm 0.5020 (0.5168/0.0356)	mem 48464MB
[2023-11-10 02:37:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1240/1251]	eta 0:00:26 lr 0.131164	time 2.3868 (2.3968)	model_time 2.3867 (2.3949)	loss 2.0523 (2.8632)	grad_norm 0.5029 (0.5164/0.0358)	mem 48464MB
[2023-11-10 02:38:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [3/10][1250/1251]	eta 0:00:02 lr 0.130926	time 2.3891 (2.3967)	model_time 2.3888 (2.3949)	loss 3.5551 (2.8652)	grad_norm 0.5334 (0.5164/0.0360)	mem 48464MB
[2023-11-10 02:38:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 3 training takes 0:49:58
[2023-11-10 02:38:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_3.pth saving......
[2023-11-10 02:40:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_3.pth saved !!!
[2023-11-10 02:40:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.765 (3.765)	Loss 0.6519 (0.6519)	Acc@1 86.914 (86.914)	Acc@5 98.340 (98.340)	Mem 48464MB
[2023-11-10 02:40:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.244 (2.375)	Loss 0.7271 (0.6428)	Acc@1 86.133 (87.580)	Acc@5 97.656 (98.224)	Mem 48464MB
[2023-11-10 02:40:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.250 (2.315)	Loss 0.5693 (0.6435)	Acc@1 88.574 (87.393)	Acc@5 98.633 (98.247)	Mem 48464MB
[2023-11-10 02:41:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.294)	Loss 0.7109 (0.6491)	Acc@1 85.449 (87.210)	Acc@5 97.656 (98.233)	Mem 48464MB
[2023-11-10 02:41:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.251 (2.284)	Loss 0.6880 (0.6502)	Acc@1 87.109 (87.245)	Acc@5 97.852 (98.214)	Mem 48464MB
[2023-11-10 02:41:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:3] * Acc@1 87.320 Acc@5 98.230
[2023-11-10 02:41:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 87.3%
[2023-11-10 02:41:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-10 02:43:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-10 02:43:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 87.32%
[2023-11-10 02:43:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.688 (3.688)	Loss 0.5986 (0.5986)	Acc@1 87.695 (87.695)	Acc@5 98.535 (98.535)	Mem 48464MB
[2023-11-10 02:44:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.246 (2.368)	Loss 0.6904 (0.5940)	Acc@1 86.426 (88.104)	Acc@5 97.754 (98.366)	Mem 48464MB
[2023-11-10 02:44:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.252 (2.312)	Loss 0.5200 (0.5944)	Acc@1 89.551 (87.979)	Acc@5 98.828 (98.442)	Mem 48464MB
[2023-11-10 02:44:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.250 (2.292)	Loss 0.6519 (0.5992)	Acc@1 86.523 (87.799)	Acc@5 97.852 (98.441)	Mem 48464MB
[2023-11-10 02:45:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.252 (2.282)	Loss 0.6543 (0.6011)	Acc@1 86.621 (87.883)	Acc@5 97.461 (98.423)	Mem 48464MB
[2023-11-10 02:45:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:3] * Acc@1 87.920 Acc@5 98.424
[2023-11-10 02:45:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 87.9%
[2023-11-10 02:45:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 87.96%
[2023-11-10 02:45:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][0/1251]	eta 1:54:34 lr 0.130902	time 5.4954 (5.4954)	model_time 3.0624 (3.0624)	loss 3.1221 (3.1221)	grad_norm 0.4875 (0.4875/0.0000)	mem 48464MB
[2023-11-10 02:46:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][10/1251]	eta 0:55:23 lr 0.130663	time 2.3939 (2.6778)	model_time 2.3936 (2.4563)	loss 3.4297 (2.8544)	grad_norm 0.4936 (0.5006/0.0297)	mem 48464MB
[2023-11-10 02:46:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][20/1251]	eta 0:52:08 lr 0.130424	time 2.3900 (2.5416)	model_time 2.3897 (2.4254)	loss 3.0638 (2.8163)	grad_norm 0.5525 (0.5124/0.0356)	mem 48464MB
[2023-11-10 02:46:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][30/1251]	eta 0:50:56 lr 0.130184	time 2.3909 (2.5032)	model_time 2.3906 (2.4244)	loss 1.8668 (2.7254)	grad_norm 0.5069 (0.5107/0.0334)	mem 48464MB
[2023-11-10 02:47:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][40/1251]	eta 0:49:58 lr 0.129945	time 2.3910 (2.4760)	model_time 2.3908 (2.4163)	loss 2.7527 (2.7508)	grad_norm 0.5549 (0.5105/0.0327)	mem 48464MB
[2023-11-10 02:47:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][50/1251]	eta 0:49:13 lr 0.129705	time 2.3899 (2.4596)	model_time 2.3896 (2.4116)	loss 3.4709 (2.7701)	grad_norm 0.5453 (0.5080/0.0321)	mem 48464MB
[2023-11-10 02:48:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][60/1251]	eta 0:48:35 lr 0.129465	time 2.3927 (2.4482)	model_time 2.3925 (2.4080)	loss 2.6271 (2.8175)	grad_norm 0.5448 (0.5075/0.0339)	mem 48464MB
[2023-11-10 02:48:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][70/1251]	eta 0:48:02 lr 0.129225	time 2.3977 (2.4405)	model_time 2.3974 (2.4059)	loss 2.8142 (2.7889)	grad_norm 0.5687 (0.5083/0.0356)	mem 48464MB
[2023-11-10 02:48:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][80/1251]	eta 0:47:31 lr 0.128985	time 2.3918 (2.4347)	model_time 2.3915 (2.4043)	loss 1.5975 (2.7669)	grad_norm 0.4663 (0.5097/0.0359)	mem 48464MB
[2023-11-10 02:49:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][90/1251]	eta 0:47:01 lr 0.128744	time 2.3900 (2.4300)	model_time 2.3898 (2.4029)	loss 2.8584 (2.7565)	grad_norm 0.4631 (0.5081/0.0349)	mem 48464MB
[2023-11-10 02:49:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][100/1251]	eta 0:46:32 lr 0.128504	time 2.3916 (2.4264)	model_time 2.3914 (2.4019)	loss 3.7485 (2.7913)	grad_norm 0.5781 (0.5085/0.0354)	mem 48464MB
[2023-11-10 02:50:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][110/1251]	eta 0:46:05 lr 0.128263	time 2.3916 (2.4234)	model_time 2.3912 (2.4010)	loss 2.9663 (2.7865)	grad_norm 0.5204 (0.5085/0.0353)	mem 48464MB
[2023-11-10 02:50:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][120/1251]	eta 0:45:37 lr 0.128022	time 2.3922 (2.4209)	model_time 2.3920 (2.4003)	loss 3.6989 (2.7782)	grad_norm 0.5174 (0.5080/0.0350)	mem 48464MB
[2023-11-10 02:50:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][130/1251]	eta 0:45:11 lr 0.127781	time 2.3897 (2.4186)	model_time 2.3895 (2.3996)	loss 3.1981 (2.7969)	grad_norm 0.4877 (0.5066/0.0351)	mem 48464MB
[2023-11-10 02:51:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][140/1251]	eta 0:44:44 lr 0.127540	time 2.3921 (2.4167)	model_time 2.3920 (2.3991)	loss 3.4613 (2.8133)	grad_norm 0.4863 (0.5070/0.0361)	mem 48464MB
[2023-11-10 02:51:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][150/1251]	eta 0:44:19 lr 0.127298	time 2.3956 (2.4151)	model_time 2.3953 (2.3986)	loss 2.7274 (2.8084)	grad_norm 0.5112 (0.5069/0.0364)	mem 48464MB
[2023-11-10 02:52:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][160/1251]	eta 0:43:53 lr 0.127056	time 2.3907 (2.4138)	model_time 2.3904 (2.3983)	loss 2.8823 (2.8290)	grad_norm 0.5371 (0.5080/0.0360)	mem 48464MB
[2023-11-10 02:52:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][170/1251]	eta 0:43:27 lr 0.126815	time 2.3956 (2.4125)	model_time 2.3954 (2.3979)	loss 3.1398 (2.8238)	grad_norm 0.4932 (0.5079/0.0353)	mem 48464MB
[2023-11-10 02:52:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][180/1251]	eta 0:43:02 lr 0.126573	time 2.3926 (2.4114)	model_time 2.3924 (2.3976)	loss 3.3058 (2.8339)	grad_norm 0.5152 (0.5073/0.0354)	mem 48464MB
[2023-11-10 02:53:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][190/1251]	eta 0:42:37 lr 0.126330	time 2.3908 (2.4104)	model_time 2.3906 (2.3973)	loss 2.3816 (2.8198)	grad_norm 0.5539 (0.5075/0.0351)	mem 48464MB
[2023-11-10 02:53:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][200/1251]	eta 0:42:12 lr 0.126088	time 2.3919 (2.4095)	model_time 2.3917 (2.3970)	loss 3.1174 (2.8289)	grad_norm 0.5254 (0.5077/0.0354)	mem 48464MB
[2023-11-10 02:54:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][210/1251]	eta 0:41:47 lr 0.125846	time 2.3909 (2.4086)	model_time 2.3907 (2.3967)	loss 3.7664 (2.8441)	grad_norm 0.5395 (0.5080/0.0350)	mem 48464MB
[2023-11-10 02:54:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][220/1251]	eta 0:41:22 lr 0.125603	time 2.3910 (2.4079)	model_time 2.3907 (2.3965)	loss 2.4479 (2.8477)	grad_norm 0.4966 (0.5078/0.0345)	mem 48464MB
[2023-11-10 02:54:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][230/1251]	eta 0:40:57 lr 0.125360	time 2.3918 (2.4071)	model_time 2.3916 (2.3962)	loss 3.4221 (2.8421)	grad_norm 0.4807 (0.5077/0.0343)	mem 48464MB
[2023-11-10 02:55:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][240/1251]	eta 0:40:34 lr 0.125117	time 2.3903 (2.4080)	model_time 2.3900 (2.3975)	loss 3.5324 (2.8584)	grad_norm 0.4832 (0.5080/0.0347)	mem 48464MB
[2023-11-10 02:55:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][250/1251]	eta 0:40:09 lr 0.124874	time 2.3940 (2.4074)	model_time 2.3938 (2.3973)	loss 3.5684 (2.8600)	grad_norm 0.5282 (0.5087/0.0354)	mem 48464MB
[2023-11-10 02:56:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][260/1251]	eta 0:39:45 lr 0.124631	time 2.3941 (2.4068)	model_time 2.3938 (2.3971)	loss 3.7048 (2.8592)	grad_norm 0.5320 (0.5089/0.0350)	mem 48464MB
[2023-11-10 02:56:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][270/1251]	eta 0:39:20 lr 0.124387	time 2.3922 (2.4063)	model_time 2.3919 (2.3970)	loss 2.8414 (2.8596)	grad_norm 0.4820 (0.5088/0.0351)	mem 48464MB
[2023-11-10 02:56:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][280/1251]	eta 0:38:56 lr 0.124143	time 2.3866 (2.4058)	model_time 2.3863 (2.3968)	loss 3.5679 (2.8653)	grad_norm 0.5549 (0.5089/0.0347)	mem 48464MB
[2023-11-10 02:57:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][290/1251]	eta 0:38:31 lr 0.123900	time 2.3924 (2.4053)	model_time 2.3922 (2.3966)	loss 1.7271 (2.8626)	grad_norm 0.4733 (0.5086/0.0346)	mem 48464MB
[2023-11-10 02:57:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][300/1251]	eta 0:38:07 lr 0.123656	time 2.3937 (2.4048)	model_time 2.3935 (2.3964)	loss 2.6461 (2.8695)	grad_norm 0.4775 (0.5086/0.0342)	mem 48464MB
[2023-11-10 02:57:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][310/1251]	eta 0:37:42 lr 0.123412	time 2.3923 (2.4044)	model_time 2.3921 (2.3963)	loss 2.7537 (2.8602)	grad_norm 0.5755 (0.5087/0.0342)	mem 48464MB
[2023-11-10 02:58:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][320/1251]	eta 0:37:18 lr 0.123167	time 2.3931 (2.4040)	model_time 2.3928 (2.3961)	loss 3.2995 (2.8558)	grad_norm 0.4673 (0.5074/0.0340)	mem 48464MB
[2023-11-10 02:58:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][330/1251]	eta 0:36:53 lr 0.122923	time 2.3945 (2.4037)	model_time 2.3943 (2.3960)	loss 2.0350 (2.8498)	grad_norm 0.4592 (0.5073/0.0343)	mem 48464MB
[2023-11-10 02:59:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][340/1251]	eta 0:36:29 lr 0.122679	time 2.3926 (2.4033)	model_time 2.3924 (2.3958)	loss 2.8934 (2.8515)	grad_norm 0.5341 (0.5075/0.0345)	mem 48464MB
[2023-11-10 02:59:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][350/1251]	eta 0:36:05 lr 0.122434	time 2.3890 (2.4030)	model_time 2.3888 (2.3957)	loss 3.2214 (2.8540)	grad_norm 0.4822 (0.5071/0.0349)	mem 48464MB
[2023-11-10 02:59:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][360/1251]	eta 0:35:40 lr 0.122189	time 2.3899 (2.4027)	model_time 2.3897 (2.3956)	loss 2.8146 (2.8510)	grad_norm 0.5302 (0.5073/0.0347)	mem 48464MB
[2023-11-10 03:00:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][370/1251]	eta 0:35:16 lr 0.121944	time 2.3903 (2.4024)	model_time 2.3901 (2.3955)	loss 3.0038 (2.8477)	grad_norm 0.5036 (0.5061/0.0347)	mem 48464MB
[2023-11-10 03:00:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][380/1251]	eta 0:34:52 lr 0.121699	time 2.3887 (2.4021)	model_time 2.3885 (2.3954)	loss 2.9570 (2.8432)	grad_norm 0.5181 (0.5052/0.0343)	mem 48464MB
[2023-11-10 03:01:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][390/1251]	eta 0:34:27 lr 0.121454	time 2.3899 (2.4018)	model_time 2.3897 (2.3952)	loss 3.0132 (2.8418)	grad_norm 0.5629 (0.5058/0.0349)	mem 48464MB
[2023-11-10 03:01:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][400/1251]	eta 0:34:03 lr 0.121209	time 2.3950 (2.4016)	model_time 2.3947 (2.3951)	loss 3.1035 (2.8474)	grad_norm 0.5084 (0.5057/0.0346)	mem 48464MB
[2023-11-10 03:01:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][410/1251]	eta 0:33:39 lr 0.120963	time 2.3898 (2.4014)	model_time 2.3894 (2.3951)	loss 3.2182 (2.8487)	grad_norm 0.4765 (0.5062/0.0344)	mem 48464MB
[2023-11-10 03:02:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][420/1251]	eta 0:33:15 lr 0.120717	time 2.3939 (2.4011)	model_time 2.3936 (2.3950)	loss 2.6120 (2.8420)	grad_norm 0.4775 (0.5061/0.0344)	mem 48464MB
[2023-11-10 03:02:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][430/1251]	eta 0:32:51 lr 0.120472	time 2.3899 (2.4012)	model_time 2.3893 (2.3952)	loss 3.0508 (2.8421)	grad_norm 0.5619 (0.5062/0.0348)	mem 48464MB
[2023-11-10 03:03:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][440/1251]	eta 0:32:27 lr 0.120226	time 2.3934 (2.4011)	model_time 2.3931 (2.3952)	loss 3.3541 (2.8416)	grad_norm 0.5081 (0.5066/0.0346)	mem 48464MB
[2023-11-10 03:03:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][450/1251]	eta 0:32:03 lr 0.119980	time 2.3955 (2.4009)	model_time 2.3952 (2.3951)	loss 2.9009 (2.8434)	grad_norm 0.4892 (0.5069/0.0342)	mem 48464MB
[2023-11-10 03:03:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][460/1251]	eta 0:31:38 lr 0.119734	time 2.3919 (2.4007)	model_time 2.3917 (2.3950)	loss 2.9544 (2.8464)	grad_norm 0.5112 (0.5065/0.0342)	mem 48464MB
[2023-11-10 03:04:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][470/1251]	eta 0:31:14 lr 0.119487	time 2.3891 (2.4005)	model_time 2.3888 (2.3949)	loss 3.2792 (2.8483)	grad_norm 0.5122 (0.5064/0.0341)	mem 48464MB
[2023-11-10 03:04:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][480/1251]	eta 0:30:50 lr 0.119241	time 2.3887 (2.4003)	model_time 2.3885 (2.3948)	loss 1.8687 (2.8430)	grad_norm 0.4961 (0.5062/0.0340)	mem 48464MB
[2023-11-10 03:05:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][490/1251]	eta 0:30:26 lr 0.118995	time 2.3902 (2.4001)	model_time 2.3899 (2.3948)	loss 3.1493 (2.8460)	grad_norm 0.5588 (0.5057/0.0344)	mem 48464MB
[2023-11-10 03:05:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][500/1251]	eta 0:30:02 lr 0.118748	time 2.3896 (2.3999)	model_time 2.3894 (2.3947)	loss 3.5107 (2.8449)	grad_norm 0.5046 (0.5052/0.0342)	mem 48464MB
[2023-11-10 03:05:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][510/1251]	eta 0:29:38 lr 0.118501	time 2.3897 (2.3997)	model_time 2.3894 (2.3946)	loss 2.7410 (2.8417)	grad_norm 0.5120 (0.5047/0.0342)	mem 48464MB
[2023-11-10 03:06:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][520/1251]	eta 0:29:14 lr 0.118254	time 2.3940 (2.3996)	model_time 2.3938 (2.3946)	loss 3.3761 (2.8453)	grad_norm 0.4893 (0.5053/0.0350)	mem 48464MB
[2023-11-10 03:06:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][530/1251]	eta 0:28:49 lr 0.118007	time 2.3909 (2.3994)	model_time 2.3907 (2.3945)	loss 1.7729 (2.8386)	grad_norm 0.5500 (0.5049/0.0351)	mem 48464MB
[2023-11-10 03:07:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][540/1251]	eta 0:28:25 lr 0.117760	time 2.3896 (2.3993)	model_time 2.3894 (2.3944)	loss 1.5714 (2.8368)	grad_norm 0.4961 (0.5039/0.0346)	mem 48464MB
[2023-11-10 03:07:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][550/1251]	eta 0:28:01 lr 0.117513	time 2.3896 (2.3991)	model_time 2.3894 (2.3944)	loss 2.9493 (2.8340)	grad_norm 0.5238 (0.5031/0.0337)	mem 48464MB
[2023-11-10 03:07:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][560/1251]	eta 0:27:37 lr 0.117266	time 2.3903 (2.3990)	model_time 2.3901 (2.3943)	loss 2.1782 (2.8351)	grad_norm 0.4598 (0.5026/0.0340)	mem 48464MB
[2023-11-10 03:08:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][570/1251]	eta 0:27:13 lr 0.117018	time 2.3923 (2.3989)	model_time 2.3921 (2.3942)	loss 1.9483 (2.8319)	grad_norm 0.4980 (0.5027/0.0338)	mem 48464MB
[2023-11-10 03:08:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][580/1251]	eta 0:26:49 lr 0.116771	time 2.3950 (2.3988)	model_time 2.3947 (2.3942)	loss 3.1465 (2.8324)	grad_norm 0.5401 (0.5028/0.0342)	mem 48464MB
[2023-11-10 03:09:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][590/1251]	eta 0:26:25 lr 0.116523	time 2.3942 (2.3987)	model_time 2.3938 (2.3942)	loss 3.5969 (2.8367)	grad_norm 0.4895 (0.5027/0.0340)	mem 48464MB
[2023-11-10 03:09:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][600/1251]	eta 0:26:01 lr 0.116276	time 2.3917 (2.3986)	model_time 2.3914 (2.3942)	loss 2.3261 (2.8344)	grad_norm 0.4645 (0.5028/0.0343)	mem 48464MB
[2023-11-10 03:09:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][610/1251]	eta 0:25:37 lr 0.116028	time 2.3918 (2.3985)	model_time 2.3916 (2.3942)	loss 3.2528 (2.8359)	grad_norm 0.4861 (0.5023/0.0342)	mem 48464MB
[2023-11-10 03:10:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][620/1251]	eta 0:25:13 lr 0.115780	time 2.3945 (2.3984)	model_time 2.3943 (2.3941)	loss 3.3249 (2.8361)	grad_norm 0.4721 (0.5030/0.0344)	mem 48464MB
[2023-11-10 03:10:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][630/1251]	eta 0:24:49 lr 0.115532	time 2.3899 (2.3983)	model_time 2.3896 (2.3941)	loss 3.0566 (2.8411)	grad_norm 0.4622 (0.5027/0.0344)	mem 48464MB
[2023-11-10 03:11:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][640/1251]	eta 0:24:25 lr 0.115284	time 2.3896 (2.3982)	model_time 2.3894 (2.3941)	loss 1.7484 (2.8398)	grad_norm 0.4898 (0.5023/0.0344)	mem 48464MB
[2023-11-10 03:11:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][650/1251]	eta 0:24:01 lr 0.115035	time 2.3926 (2.3984)	model_time 2.3924 (2.3942)	loss 2.9198 (2.8375)	grad_norm 0.5419 (0.5032/0.0344)	mem 48464MB
[2023-11-10 03:11:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][660/1251]	eta 0:23:37 lr 0.114787	time 2.3940 (2.3983)	model_time 2.3936 (2.3943)	loss 2.6277 (2.8367)	grad_norm 0.5123 (0.5027/0.0344)	mem 48464MB
[2023-11-10 03:12:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][670/1251]	eta 0:23:13 lr 0.114539	time 2.3939 (2.3983)	model_time 2.3934 (2.3943)	loss 3.4429 (2.8361)	grad_norm 0.4949 (0.5033/0.0339)	mem 48464MB
[2023-11-10 03:12:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][680/1251]	eta 0:22:49 lr 0.114290	time 2.3966 (2.3982)	model_time 2.3962 (2.3943)	loss 2.7613 (2.8370)	grad_norm 0.4901 (0.5044/0.0342)	mem 48464MB
[2023-11-10 03:13:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][690/1251]	eta 0:22:25 lr 0.114042	time 2.3897 (2.3982)	model_time 2.3894 (2.3943)	loss 1.4791 (2.8382)	grad_norm 0.5330 (0.5039/0.0336)	mem 48464MB
[2023-11-10 03:13:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][700/1251]	eta 0:22:01 lr 0.113793	time 2.3973 (2.3981)	model_time 2.3969 (2.3943)	loss 2.6052 (2.8398)	grad_norm 0.5363 (0.5040/0.0338)	mem 48464MB
[2023-11-10 03:13:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][710/1251]	eta 0:21:37 lr 0.113544	time 2.3948 (2.3981)	model_time 2.3944 (2.3943)	loss 3.4635 (2.8379)	grad_norm 0.4877 (0.5031/0.0336)	mem 48464MB
[2023-11-10 03:14:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][720/1251]	eta 0:21:13 lr 0.113295	time 2.3964 (2.3980)	model_time 2.3960 (2.3943)	loss 2.2716 (2.8342)	grad_norm 0.4441 (0.5026/0.0336)	mem 48464MB
[2023-11-10 03:14:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][730/1251]	eta 0:20:49 lr 0.113046	time 2.3945 (2.3980)	model_time 2.3941 (2.3943)	loss 3.1187 (2.8327)	grad_norm 0.4553 (0.5021/0.0333)	mem 48464MB
[2023-11-10 03:15:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][740/1251]	eta 0:20:25 lr 0.112797	time 2.3978 (2.3980)	model_time 2.3973 (2.3943)	loss 2.7117 (2.8321)	grad_norm 0.4968 (0.5014/0.0323)	mem 48464MB
[2023-11-10 03:15:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][750/1251]	eta 0:20:01 lr 0.112548	time 2.4077 (2.3980)	model_time 2.4072 (2.3944)	loss 2.8294 (2.8339)	grad_norm 0.4960 (0.5006/0.0327)	mem 48464MB
[2023-11-10 03:15:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][760/1251]	eta 0:19:37 lr 0.112299	time 2.3923 (2.3979)	model_time 2.3919 (2.3944)	loss 2.4051 (2.8366)	grad_norm 0.5193 (0.5000/0.0329)	mem 48464MB
[2023-11-10 03:16:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][770/1251]	eta 0:19:13 lr 0.112050	time 2.3949 (2.3979)	model_time 2.3945 (2.3944)	loss 2.0415 (2.8353)	grad_norm 0.4862 (0.4998/0.0332)	mem 48464MB
[2023-11-10 03:16:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][780/1251]	eta 0:18:49 lr 0.111800	time 2.3950 (2.3980)	model_time 2.3947 (2.3945)	loss 3.0064 (2.8346)	grad_norm 0.4869 (0.5005/0.0332)	mem 48464MB
[2023-11-10 03:17:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][790/1251]	eta 0:18:25 lr 0.111551	time 2.3930 (2.3979)	model_time 2.3926 (2.3945)	loss 3.7087 (2.8326)	grad_norm 0.4719 (0.5001/0.0329)	mem 48464MB
[2023-11-10 03:17:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][800/1251]	eta 0:18:01 lr 0.111302	time 2.3960 (2.3979)	model_time 2.3957 (2.3945)	loss 3.5359 (2.8329)	grad_norm 0.4680 (0.4996/0.0330)	mem 48464MB
[2023-11-10 03:17:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][810/1251]	eta 0:17:37 lr 0.111052	time 2.3930 (2.3978)	model_time 2.3927 (2.3944)	loss 2.6002 (2.8322)	grad_norm 0.5144 (0.4999/0.0333)	mem 48464MB
[2023-11-10 03:18:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][820/1251]	eta 0:17:13 lr 0.110802	time 2.3912 (2.3978)	model_time 2.3909 (2.3944)	loss 2.6284 (2.8336)	grad_norm 0.4434 (0.4991/0.0329)	mem 48464MB
[2023-11-10 03:18:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][830/1251]	eta 0:16:49 lr 0.110553	time 2.3906 (2.3977)	model_time 2.3904 (2.3944)	loss 3.0853 (2.8344)	grad_norm 0.4997 (0.4994/0.0332)	mem 48464MB
[2023-11-10 03:19:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][840/1251]	eta 0:16:25 lr 0.110303	time 2.3902 (2.3977)	model_time 2.3898 (2.3944)	loss 3.6632 (2.8346)	grad_norm 0.5138 (0.4998/0.0332)	mem 48464MB
[2023-11-10 03:19:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][850/1251]	eta 0:16:01 lr 0.110053	time 2.3943 (2.3977)	model_time 2.3938 (2.3944)	loss 2.9421 (2.8334)	grad_norm 0.5138 (0.4992/0.0334)	mem 48464MB
[2023-11-10 03:19:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][860/1251]	eta 0:15:37 lr 0.109803	time 2.3946 (2.3976)	model_time 2.3942 (2.3944)	loss 3.6691 (2.8373)	grad_norm 0.5267 (0.4996/0.0339)	mem 48464MB
[2023-11-10 03:20:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][870/1251]	eta 0:15:13 lr 0.109553	time 2.3925 (2.3976)	model_time 2.3922 (2.3944)	loss 3.8059 (2.8443)	grad_norm 0.5519 (0.4997/0.0338)	mem 48464MB
[2023-11-10 03:20:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][880/1251]	eta 0:14:49 lr 0.109303	time 2.3927 (2.3975)	model_time 2.3924 (2.3944)	loss 3.5096 (2.8456)	grad_norm 0.4601 (0.4988/0.0335)	mem 48464MB
[2023-11-10 03:21:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][890/1251]	eta 0:14:25 lr 0.109053	time 2.3956 (2.3975)	model_time 2.3953 (2.3944)	loss 2.7708 (2.8450)	grad_norm 0.4784 (0.4988/0.0335)	mem 48464MB
[2023-11-10 03:21:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][900/1251]	eta 0:14:01 lr 0.108803	time 2.3921 (2.3975)	model_time 2.3913 (2.3944)	loss 3.2670 (2.8425)	grad_norm 0.4816 (0.4991/0.0341)	mem 48464MB
[2023-11-10 03:21:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][910/1251]	eta 0:13:37 lr 0.108553	time 2.3934 (2.3975)	model_time 2.3931 (2.3944)	loss 2.2891 (2.8439)	grad_norm 0.5425 (0.4997/0.0340)	mem 48464MB
[2023-11-10 03:22:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][920/1251]	eta 0:13:13 lr 0.108303	time 2.3934 (2.3974)	model_time 2.3931 (2.3944)	loss 2.9526 (2.8436)	grad_norm 0.4965 (0.4989/0.0335)	mem 48464MB
[2023-11-10 03:22:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][930/1251]	eta 0:12:49 lr 0.108052	time 2.3923 (2.3974)	model_time 2.3920 (2.3944)	loss 3.1102 (2.8443)	grad_norm 0.4960 (0.4991/0.0335)	mem 48464MB
[2023-11-10 03:23:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][940/1251]	eta 0:12:25 lr 0.107802	time 2.3938 (2.3974)	model_time 2.3934 (2.3944)	loss 2.2063 (2.8458)	grad_norm 0.4760 (0.4991/0.0329)	mem 48464MB
[2023-11-10 03:23:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][950/1251]	eta 0:12:01 lr 0.107552	time 2.3941 (2.3973)	model_time 2.3937 (2.3944)	loss 2.1933 (2.8432)	grad_norm 0.4486 (0.4986/0.0327)	mem 48464MB
[2023-11-10 03:23:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][960/1251]	eta 0:11:37 lr 0.107301	time 2.3918 (2.3973)	model_time 2.3916 (2.3944)	loss 3.7442 (2.8407)	grad_norm 0.4896 (0.4984/0.0323)	mem 48464MB
[2023-11-10 03:24:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][970/1251]	eta 0:11:13 lr 0.107051	time 2.3937 (2.3972)	model_time 2.3934 (2.3943)	loss 3.1842 (2.8434)	grad_norm 0.5128 (0.4990/0.0326)	mem 48464MB
[2023-11-10 03:24:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][980/1251]	eta 0:10:49 lr 0.106800	time 2.3970 (2.3972)	model_time 2.3967 (2.3943)	loss 2.5392 (2.8447)	grad_norm 0.5025 (0.4987/0.0322)	mem 48464MB
[2023-11-10 03:25:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][990/1251]	eta 0:10:25 lr 0.106550	time 2.3950 (2.3975)	model_time 2.3947 (2.3946)	loss 3.2436 (2.8447)	grad_norm 0.4514 (0.4984/0.0325)	mem 48464MB
[2023-11-10 03:25:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1000/1251]	eta 0:10:01 lr 0.106299	time 2.3946 (2.3975)	model_time 2.3943 (2.3946)	loss 3.0021 (2.8455)	grad_norm 0.5053 (0.4977/0.0321)	mem 48464MB
[2023-11-10 03:25:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1010/1251]	eta 0:09:37 lr 0.106048	time 2.3979 (2.3974)	model_time 2.3976 (2.3946)	loss 3.3381 (2.8482)	grad_norm 0.5178 (0.4976/0.0320)	mem 48464MB
[2023-11-10 03:26:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1020/1251]	eta 0:09:13 lr 0.105798	time 2.3929 (2.3974)	model_time 2.3925 (2.3946)	loss 2.2815 (2.8464)	grad_norm 0.5825 (0.4982/0.0326)	mem 48464MB
[2023-11-10 03:26:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1030/1251]	eta 0:08:49 lr 0.105547	time 2.3944 (2.3974)	model_time 2.3940 (2.3946)	loss 3.0012 (2.8436)	grad_norm 0.4708 (0.4983/0.0325)	mem 48464MB
[2023-11-10 03:27:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1040/1251]	eta 0:08:25 lr 0.105296	time 2.3934 (2.3973)	model_time 2.3931 (2.3946)	loss 3.5195 (2.8439)	grad_norm 0.5229 (0.4975/0.0329)	mem 48464MB
[2023-11-10 03:27:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1050/1251]	eta 0:08:01 lr 0.105045	time 2.3958 (2.3973)	model_time 2.3954 (2.3946)	loss 2.7114 (2.8446)	grad_norm 0.4794 (0.4977/0.0327)	mem 48464MB
[2023-11-10 03:27:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1060/1251]	eta 0:07:37 lr 0.104795	time 2.3963 (2.3973)	model_time 2.3960 (2.3946)	loss 3.5738 (2.8448)	grad_norm 0.5017 (0.4981/0.0328)	mem 48464MB
[2023-11-10 03:28:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1070/1251]	eta 0:07:13 lr 0.104544	time 2.3971 (2.3973)	model_time 2.3966 (2.3946)	loss 3.7582 (2.8470)	grad_norm 0.5262 (0.4978/0.0334)	mem 48464MB
[2023-11-10 03:28:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1080/1251]	eta 0:06:49 lr 0.104293	time 2.3958 (2.3973)	model_time 2.3955 (2.3946)	loss 2.2400 (2.8468)	grad_norm 0.4786 (0.4973/0.0332)	mem 48464MB
[2023-11-10 03:29:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1090/1251]	eta 0:06:25 lr 0.104042	time 2.3929 (2.3972)	model_time 2.3926 (2.3946)	loss 2.2651 (2.8469)	grad_norm 0.4983 (0.4978/0.0329)	mem 48464MB
[2023-11-10 03:29:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1100/1251]	eta 0:06:01 lr 0.103791	time 2.3980 (2.3972)	model_time 2.3977 (2.3946)	loss 2.2348 (2.8468)	grad_norm 0.4676 (0.4982/0.0328)	mem 48464MB
[2023-11-10 03:29:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1110/1251]	eta 0:05:38 lr 0.103540	time 2.3940 (2.3972)	model_time 2.3931 (2.3946)	loss 3.1954 (2.8432)	grad_norm 0.5121 (0.4978/0.0331)	mem 48464MB
[2023-11-10 03:30:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1120/1251]	eta 0:05:14 lr 0.103289	time 2.3906 (2.3972)	model_time 2.3901 (2.3946)	loss 2.6246 (2.8426)	grad_norm 0.4492 (0.4977/0.0330)	mem 48464MB
[2023-11-10 03:30:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1130/1251]	eta 0:04:50 lr 0.103038	time 2.3909 (2.3972)	model_time 2.3905 (2.3946)	loss 2.1888 (2.8430)	grad_norm 0.5265 (0.4976/0.0328)	mem 48464MB
[2023-11-10 03:31:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1140/1251]	eta 0:04:26 lr 0.102787	time 2.3956 (2.3971)	model_time 2.3953 (2.3946)	loss 2.7064 (2.8406)	grad_norm 0.4955 (0.4975/0.0328)	mem 48464MB
[2023-11-10 03:31:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1150/1251]	eta 0:04:02 lr 0.102536	time 2.3959 (2.3971)	model_time 2.3956 (2.3946)	loss 3.0663 (2.8415)	grad_norm 0.5011 (0.4977/0.0328)	mem 48464MB
[2023-11-10 03:31:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1160/1251]	eta 0:03:38 lr 0.102285	time 2.3897 (2.3971)	model_time 2.3893 (2.3946)	loss 3.0515 (2.8412)	grad_norm 0.4945 (0.4972/0.0323)	mem 48464MB
[2023-11-10 03:32:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1170/1251]	eta 0:03:14 lr 0.102034	time 2.3977 (2.3971)	model_time 2.3973 (2.3946)	loss 3.6027 (2.8429)	grad_norm 0.4874 (0.4965/0.0325)	mem 48464MB
[2023-11-10 03:32:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1180/1251]	eta 0:02:50 lr 0.101783	time 2.3902 (2.3972)	model_time 2.3898 (2.3947)	loss 2.9673 (2.8420)	grad_norm 0.5463 (0.4967/0.0327)	mem 48464MB
[2023-11-10 03:33:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1190/1251]	eta 0:02:26 lr 0.101532	time 2.3969 (2.3971)	model_time 2.3966 (2.3947)	loss 2.3795 (2.8424)	grad_norm 0.4457 (0.4964/0.0331)	mem 48464MB
[2023-11-10 03:33:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1200/1251]	eta 0:02:02 lr 0.101281	time 2.3923 (2.3971)	model_time 2.3920 (2.3947)	loss 2.2325 (2.8435)	grad_norm 0.4968 (0.4959/0.0325)	mem 48464MB
[2023-11-10 03:33:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1210/1251]	eta 0:01:38 lr 0.101030	time 2.3932 (2.3971)	model_time 2.3929 (2.3947)	loss 1.8421 (2.8396)	grad_norm 0.4426 (0.4957/0.0328)	mem 48464MB
[2023-11-10 03:34:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1220/1251]	eta 0:01:14 lr 0.100778	time 2.3913 (2.3971)	model_time 2.3910 (2.3947)	loss 2.7592 (2.8399)	grad_norm 0.5427 (0.4964/0.0336)	mem 48464MB
[2023-11-10 03:34:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1230/1251]	eta 0:00:50 lr 0.100527	time 2.3937 (2.3971)	model_time 2.3934 (2.3947)	loss 2.8780 (2.8385)	grad_norm 0.4577 (0.4959/0.0334)	mem 48464MB
[2023-11-10 03:35:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1240/1251]	eta 0:00:26 lr 0.100276	time 2.3904 (2.3971)	model_time 2.3899 (2.3947)	loss 2.0292 (2.8401)	grad_norm 0.4658 (0.4956/0.0334)	mem 48464MB
[2023-11-10 03:35:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [4/10][1250/1251]	eta 0:00:02 lr 0.100025	time 2.3931 (2.3970)	model_time 2.3930 (2.3947)	loss 2.2390 (2.8427)	grad_norm 0.5547 (0.4958/0.0333)	mem 48464MB
[2023-11-10 03:35:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 4 training takes 0:49:58
[2023-11-10 03:35:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_4.pth saving......
[2023-11-10 03:37:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_4.pth saved !!!
[2023-11-10 03:37:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.767 (3.767)	Loss 0.6152 (0.6152)	Acc@1 87.598 (87.598)	Acc@5 98.730 (98.730)	Mem 48464MB
[2023-11-10 03:37:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.247 (2.376)	Loss 0.7227 (0.6264)	Acc@1 85.742 (87.766)	Acc@5 97.852 (98.349)	Mem 48464MB
[2023-11-10 03:38:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.254 (2.317)	Loss 0.5576 (0.6271)	Acc@1 88.965 (87.742)	Acc@5 98.828 (98.377)	Mem 48464MB
[2023-11-10 03:38:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.250 (2.296)	Loss 0.6812 (0.6330)	Acc@1 86.523 (87.566)	Acc@5 98.242 (98.368)	Mem 48464MB
[2023-11-10 03:38:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.251 (2.285)	Loss 0.6812 (0.6349)	Acc@1 86.914 (87.574)	Acc@5 97.949 (98.354)	Mem 48464MB
[2023-11-10 03:39:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:4] * Acc@1 87.636 Acc@5 98.352
[2023-11-10 03:39:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 87.6%
[2023-11-10 03:39:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-10 03:40:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-10 03:40:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 87.64%
[2023-11-10 03:40:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.796 (3.796)	Loss 0.5986 (0.5986)	Acc@1 88.184 (88.184)	Acc@5 98.730 (98.730)	Mem 48464MB
[2023-11-10 03:41:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.245 (2.379)	Loss 0.6880 (0.5957)	Acc@1 86.230 (88.406)	Acc@5 97.754 (98.402)	Mem 48464MB
[2023-11-10 03:41:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.250 (2.317)	Loss 0.5288 (0.5958)	Acc@1 89.746 (88.323)	Acc@5 98.730 (98.461)	Mem 48464MB
[2023-11-10 03:41:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.296)	Loss 0.6587 (0.6014)	Acc@1 86.133 (88.108)	Acc@5 98.047 (98.434)	Mem 48464MB
[2023-11-10 03:42:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.251 (2.286)	Loss 0.6528 (0.6034)	Acc@1 86.816 (88.062)	Acc@5 97.852 (98.423)	Mem 48464MB
[2023-11-10 03:42:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:4] * Acc@1 88.086 Acc@5 98.424
[2023-11-10 03:42:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 88.1%
[2023-11-10 03:42:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saving......
[2023-11-10 03:44:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saved !!!
[2023-11-10 03:44:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 88.09%
[2023-11-10 03:44:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][0/1251]	eta 1:19:00 lr 0.100000	time 3.7893 (3.7893)	model_time 2.3958 (2.3958)	loss 2.9702 (2.9702)	grad_norm 0.5364 (0.5364/0.0000)	mem 48464MB
[2023-11-10 03:44:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][10/1251]	eta 0:51:53 lr 0.099749	time 2.3896 (2.5089)	model_time 2.3868 (2.3817)	loss 3.4891 (3.0500)	grad_norm 0.5114 (0.4915/0.0263)	mem 48464MB
[2023-11-10 03:45:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][20/1251]	eta 0:50:21 lr 0.099498	time 2.3903 (2.4544)	model_time 2.3899 (2.3875)	loss 2.8292 (2.9287)	grad_norm 0.5392 (0.4977/0.0305)	mem 48464MB
[2023-11-10 03:45:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][30/1251]	eta 0:49:32 lr 0.099247	time 2.3928 (2.4345)	model_time 2.3924 (2.3890)	loss 2.4497 (2.9056)	grad_norm 0.4712 (0.4950/0.0291)	mem 48464MB
[2023-11-10 03:46:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][40/1251]	eta 0:48:56 lr 0.098996	time 2.3958 (2.4251)	model_time 2.3953 (2.3906)	loss 3.3805 (2.9836)	grad_norm 0.4518 (0.5010/0.0350)	mem 48464MB
[2023-11-10 03:46:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][50/1251]	eta 0:48:25 lr 0.098744	time 2.3996 (2.4195)	model_time 2.3993 (2.3916)	loss 3.5755 (3.0075)	grad_norm 0.5274 (0.5005/0.0358)	mem 48464MB
[2023-11-10 03:46:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][60/1251]	eta 0:47:57 lr 0.098493	time 2.3984 (2.4157)	model_time 2.3980 (2.3923)	loss 3.0104 (2.9989)	grad_norm 0.5215 (0.4986/0.0341)	mem 48464MB
[2023-11-10 03:47:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][70/1251]	eta 0:47:29 lr 0.098242	time 2.3920 (2.4126)	model_time 2.3916 (2.3924)	loss 2.1298 (2.9854)	grad_norm 0.4642 (0.4967/0.0334)	mem 48464MB
[2023-11-10 03:47:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][80/1251]	eta 0:47:02 lr 0.097991	time 2.3923 (2.4103)	model_time 2.3921 (2.3926)	loss 2.9802 (2.9537)	grad_norm 0.5229 (0.4945/0.0350)	mem 48464MB
[2023-11-10 03:48:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][90/1251]	eta 0:46:38 lr 0.097740	time 2.3949 (2.4101)	model_time 2.3944 (2.3942)	loss 3.3067 (2.9518)	grad_norm 0.4939 (0.4964/0.0358)	mem 48464MB
[2023-11-10 03:48:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][100/1251]	eta 0:46:12 lr 0.097489	time 2.3957 (2.4086)	model_time 2.3954 (2.3943)	loss 2.4114 (2.9156)	grad_norm 0.4544 (0.4946/0.0350)	mem 48464MB
[2023-11-10 03:48:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][110/1251]	eta 0:45:46 lr 0.097238	time 2.3982 (2.4074)	model_time 2.3977 (2.3943)	loss 3.5858 (2.9249)	grad_norm 0.4961 (0.4960/0.0354)	mem 48464MB
[2023-11-10 03:49:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][120/1251]	eta 0:45:21 lr 0.096987	time 2.3951 (2.4064)	model_time 2.3948 (2.3943)	loss 3.0208 (2.9262)	grad_norm 0.4583 (0.4944/0.0350)	mem 48464MB
[2023-11-10 03:49:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][130/1251]	eta 0:44:56 lr 0.096736	time 2.3911 (2.4055)	model_time 2.3907 (2.3944)	loss 2.8461 (2.9204)	grad_norm 0.5146 (0.4946/0.0354)	mem 48464MB
[2023-11-10 03:50:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][140/1251]	eta 0:44:31 lr 0.096485	time 2.3967 (2.4048)	model_time 2.3962 (2.3944)	loss 2.3322 (2.9160)	grad_norm 0.5034 (0.4944/0.0346)	mem 48464MB
[2023-11-10 03:50:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][150/1251]	eta 0:44:07 lr 0.096234	time 2.3952 (2.4042)	model_time 2.3949 (2.3945)	loss 3.6494 (2.9216)	grad_norm 0.5266 (0.4953/0.0349)	mem 48464MB
[2023-11-10 03:50:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][160/1251]	eta 0:43:42 lr 0.095983	time 2.3928 (2.4035)	model_time 2.3925 (2.3944)	loss 3.0908 (2.9137)	grad_norm 0.4599 (0.4949/0.0342)	mem 48464MB
[2023-11-10 03:51:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][170/1251]	eta 0:43:17 lr 0.095732	time 2.3943 (2.4029)	model_time 2.3940 (2.3943)	loss 3.0484 (2.9156)	grad_norm 0.4703 (0.4947/0.0337)	mem 48464MB
[2023-11-10 03:51:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][180/1251]	eta 0:42:53 lr 0.095481	time 2.3960 (2.4024)	model_time 2.3953 (2.3942)	loss 2.8369 (2.8978)	grad_norm 0.4393 (0.4935/0.0338)	mem 48464MB
[2023-11-10 03:52:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][190/1251]	eta 0:42:28 lr 0.095230	time 2.3933 (2.4020)	model_time 2.3929 (2.3942)	loss 1.9172 (2.8893)	grad_norm 0.4744 (0.4937/0.0336)	mem 48464MB
[2023-11-10 03:52:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][200/1251]	eta 0:42:04 lr 0.094980	time 2.3927 (2.4017)	model_time 2.3923 (2.3943)	loss 1.7124 (2.8769)	grad_norm 0.4567 (0.4933/0.0332)	mem 48464MB
[2023-11-10 03:52:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][210/1251]	eta 0:41:39 lr 0.094729	time 2.3954 (2.4015)	model_time 2.3948 (2.3943)	loss 1.8014 (2.8667)	grad_norm 0.5022 (0.4927/0.0333)	mem 48464MB
[2023-11-10 03:53:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][220/1251]	eta 0:41:15 lr 0.094478	time 2.3916 (2.4011)	model_time 2.3914 (2.3943)	loss 2.8092 (2.8537)	grad_norm 0.5217 (0.4926/0.0330)	mem 48464MB
[2023-11-10 03:53:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][230/1251]	eta 0:40:51 lr 0.094227	time 2.3952 (2.4009)	model_time 2.3947 (2.3943)	loss 3.3505 (2.8457)	grad_norm 0.4730 (0.4916/0.0331)	mem 48464MB
[2023-11-10 03:54:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][240/1251]	eta 0:40:27 lr 0.093977	time 2.3958 (2.4006)	model_time 2.3955 (2.3943)	loss 2.8934 (2.8457)	grad_norm 0.5259 (0.4912/0.0331)	mem 48464MB
[2023-11-10 03:54:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][250/1251]	eta 0:40:02 lr 0.093726	time 2.3986 (2.4004)	model_time 2.3980 (2.3943)	loss 3.3189 (2.8395)	grad_norm 0.4871 (0.4900/0.0333)	mem 48464MB
[2023-11-10 03:54:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][260/1251]	eta 0:39:38 lr 0.093475	time 2.3977 (2.4002)	model_time 2.3971 (2.3944)	loss 2.8264 (2.8434)	grad_norm 0.4405 (0.4905/0.0341)	mem 48464MB
[2023-11-10 03:55:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][270/1251]	eta 0:39:14 lr 0.093225	time 2.3936 (2.4000)	model_time 2.3932 (2.3944)	loss 3.1869 (2.8436)	grad_norm 0.5091 (0.4907/0.0344)	mem 48464MB
[2023-11-10 03:55:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][280/1251]	eta 0:38:50 lr 0.092974	time 2.3975 (2.3999)	model_time 2.3970 (2.3944)	loss 3.4781 (2.8407)	grad_norm 0.4801 (0.4910/0.0345)	mem 48464MB
[2023-11-10 03:56:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][290/1251]	eta 0:38:26 lr 0.092724	time 2.3978 (2.3997)	model_time 2.3973 (2.3944)	loss 3.3042 (2.8474)	grad_norm 0.5439 (0.4915/0.0346)	mem 48464MB
[2023-11-10 03:56:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][300/1251]	eta 0:38:02 lr 0.092473	time 2.3920 (2.3999)	model_time 2.3916 (2.3948)	loss 2.4225 (2.8492)	grad_norm 0.5355 (0.4916/0.0344)	mem 48464MB
[2023-11-10 03:56:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][310/1251]	eta 0:37:38 lr 0.092223	time 2.3932 (2.3997)	model_time 2.3928 (2.3947)	loss 2.9017 (2.8494)	grad_norm 0.4535 (0.4918/0.0351)	mem 48464MB
[2023-11-10 03:57:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][320/1251]	eta 0:37:13 lr 0.091973	time 2.3940 (2.3995)	model_time 2.3935 (2.3947)	loss 1.6805 (2.8505)	grad_norm 0.4426 (0.4916/0.0352)	mem 48464MB
[2023-11-10 03:57:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][330/1251]	eta 0:36:49 lr 0.091722	time 2.3921 (2.3993)	model_time 2.3918 (2.3946)	loss 3.0640 (2.8498)	grad_norm 0.4654 (0.4914/0.0353)	mem 48464MB
[2023-11-10 03:58:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][340/1251]	eta 0:36:25 lr 0.091472	time 2.3946 (2.3992)	model_time 2.3942 (2.3946)	loss 3.4156 (2.8546)	grad_norm 0.4821 (0.4906/0.0343)	mem 48464MB
[2023-11-10 03:58:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][350/1251]	eta 0:36:01 lr 0.091222	time 2.3922 (2.3991)	model_time 2.3919 (2.3946)	loss 1.6207 (2.8421)	grad_norm 0.4635 (0.4902/0.0345)	mem 48464MB
[2023-11-10 03:58:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][360/1251]	eta 0:35:37 lr 0.090972	time 2.3924 (2.3989)	model_time 2.3921 (2.3945)	loss 2.9904 (2.8421)	grad_norm 0.5340 (0.4899/0.0347)	mem 48464MB
[2023-11-10 03:59:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][370/1251]	eta 0:35:13 lr 0.090722	time 2.3936 (2.3987)	model_time 2.3935 (2.3945)	loss 2.7177 (2.8458)	grad_norm 0.4873 (0.4900/0.0347)	mem 48464MB
[2023-11-10 03:59:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][380/1251]	eta 0:34:49 lr 0.090472	time 2.3920 (2.3986)	model_time 2.3914 (2.3944)	loss 3.0447 (2.8407)	grad_norm 0.5202 (0.4900/0.0342)	mem 48464MB
[2023-11-10 04:00:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][390/1251]	eta 0:34:25 lr 0.090222	time 2.3974 (2.3987)	model_time 2.3970 (2.3947)	loss 2.6626 (2.8423)	grad_norm 0.4735 (0.4898/0.0343)	mem 48464MB
[2023-11-10 04:00:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][400/1251]	eta 0:34:01 lr 0.089972	time 2.3981 (2.3986)	model_time 2.3978 (2.3947)	loss 2.8617 (2.8365)	grad_norm 0.5805 (0.4905/0.0349)	mem 48464MB
[2023-11-10 04:00:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][410/1251]	eta 0:33:37 lr 0.089722	time 2.3984 (2.3985)	model_time 2.3980 (2.3946)	loss 3.3529 (2.8355)	grad_norm 0.4579 (0.4896/0.0342)	mem 48464MB
[2023-11-10 04:01:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][420/1251]	eta 0:33:13 lr 0.089472	time 2.3979 (2.3985)	model_time 2.3976 (2.3946)	loss 3.0164 (2.8309)	grad_norm 0.4666 (0.4897/0.0344)	mem 48464MB
[2023-11-10 04:01:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][430/1251]	eta 0:32:49 lr 0.089223	time 2.3954 (2.3984)	model_time 2.3949 (2.3946)	loss 2.5821 (2.8241)	grad_norm 0.4595 (0.4889/0.0343)	mem 48464MB
[2023-11-10 04:02:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][440/1251]	eta 0:32:25 lr 0.088973	time 2.3931 (2.3983)	model_time 2.3927 (2.3947)	loss 3.3541 (2.8209)	grad_norm 0.5268 (0.4889/0.0344)	mem 48464MB
[2023-11-10 04:02:24 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][450/1251]	eta 0:32:01 lr 0.088723	time 2.3937 (2.3983)	model_time 2.3933 (2.3947)	loss 3.0879 (2.8261)	grad_norm 0.4551 (0.4881/0.0340)	mem 48464MB
[2023-11-10 04:02:48 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][460/1251]	eta 0:31:36 lr 0.088474	time 2.3919 (2.3982)	model_time 2.3916 (2.3947)	loss 3.6711 (2.8283)	grad_norm 0.5396 (0.4883/0.0343)	mem 48464MB
[2023-11-10 04:03:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][470/1251]	eta 0:31:12 lr 0.088224	time 2.3922 (2.3981)	model_time 2.3919 (2.3946)	loss 2.0249 (2.8332)	grad_norm 0.4362 (0.4881/0.0343)	mem 48464MB
[2023-11-10 04:03:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][480/1251]	eta 0:30:48 lr 0.087975	time 2.3976 (2.3980)	model_time 2.3973 (2.3946)	loss 2.1587 (2.8323)	grad_norm 0.5282 (0.4889/0.0344)	mem 48464MB
[2023-11-10 04:03:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][490/1251]	eta 0:30:24 lr 0.087726	time 2.3918 (2.3979)	model_time 2.3915 (2.3946)	loss 2.0292 (2.8257)	grad_norm 0.4847 (0.4883/0.0343)	mem 48464MB
[2023-11-10 04:04:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][500/1251]	eta 0:30:00 lr 0.087477	time 2.3985 (2.3979)	model_time 2.3981 (2.3946)	loss 3.5048 (2.8233)	grad_norm 0.5294 (0.4882/0.0346)	mem 48464MB
[2023-11-10 04:04:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][510/1251]	eta 0:29:36 lr 0.087228	time 2.5178 (2.3980)	model_time 2.5176 (2.3948)	loss 2.7059 (2.8186)	grad_norm 0.4670 (0.4885/0.0345)	mem 48464MB
[2023-11-10 04:05:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][520/1251]	eta 0:29:12 lr 0.086979	time 2.3980 (2.3980)	model_time 2.3976 (2.3948)	loss 3.0109 (2.8206)	grad_norm 0.4553 (0.4891/0.0352)	mem 48464MB
[2023-11-10 04:05:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][530/1251]	eta 0:28:48 lr 0.086730	time 2.3963 (2.3979)	model_time 2.3958 (2.3948)	loss 2.6029 (2.8217)	grad_norm 0.5396 (0.4899/0.0353)	mem 48464MB
[2023-11-10 04:05:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][540/1251]	eta 0:28:24 lr 0.086481	time 2.3985 (2.3979)	model_time 2.3982 (2.3948)	loss 2.7955 (2.8261)	grad_norm 0.4507 (0.4899/0.0353)	mem 48464MB
[2023-11-10 04:06:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][550/1251]	eta 0:28:00 lr 0.086232	time 2.3974 (2.3978)	model_time 2.3970 (2.3948)	loss 1.5643 (2.8195)	grad_norm 0.5167 (0.4912/0.0351)	mem 48464MB
[2023-11-10 04:06:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][560/1251]	eta 0:27:36 lr 0.085983	time 2.3948 (2.3980)	model_time 2.3945 (2.3950)	loss 2.8973 (2.8216)	grad_norm 0.4659 (0.4901/0.0346)	mem 48464MB
[2023-11-10 04:07:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][570/1251]	eta 0:27:12 lr 0.085735	time 2.3944 (2.3979)	model_time 2.3940 (2.3949)	loss 2.4341 (2.8221)	grad_norm 0.4977 (0.4900/0.0340)	mem 48464MB
[2023-11-10 04:07:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][580/1251]	eta 0:26:48 lr 0.085486	time 2.3971 (2.3979)	model_time 2.3963 (2.3950)	loss 1.8300 (2.8196)	grad_norm 0.4541 (0.4892/0.0337)	mem 48464MB
[2023-11-10 04:07:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][590/1251]	eta 0:26:24 lr 0.085238	time 2.3930 (2.3978)	model_time 2.3925 (2.3950)	loss 3.0560 (2.8221)	grad_norm 0.4766 (0.4884/0.0336)	mem 48464MB
[2023-11-10 04:08:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][600/1251]	eta 0:26:00 lr 0.084989	time 2.3979 (2.3978)	model_time 2.3976 (2.3950)	loss 2.6775 (2.8149)	grad_norm 0.4663 (0.4874/0.0333)	mem 48464MB
[2023-11-10 04:08:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][610/1251]	eta 0:25:36 lr 0.084741	time 2.3925 (2.3978)	model_time 2.3921 (2.3950)	loss 2.0783 (2.8131)	grad_norm 0.4541 (0.4870/0.0331)	mem 48464MB
[2023-11-10 04:09:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][620/1251]	eta 0:25:12 lr 0.084493	time 2.3984 (2.3978)	model_time 2.3979 (2.3950)	loss 2.2198 (2.8153)	grad_norm 0.5066 (0.4877/0.0333)	mem 48464MB
[2023-11-10 04:09:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][630/1251]	eta 0:24:48 lr 0.084245	time 2.3990 (2.3977)	model_time 2.3987 (2.3950)	loss 3.6651 (2.8144)	grad_norm 0.5366 (0.4883/0.0334)	mem 48464MB
[2023-11-10 04:09:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][640/1251]	eta 0:24:24 lr 0.083997	time 2.3948 (2.3977)	model_time 2.3944 (2.3950)	loss 3.2688 (2.8145)	grad_norm 0.4775 (0.4875/0.0336)	mem 48464MB
[2023-11-10 04:10:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][650/1251]	eta 0:24:01 lr 0.083749	time 2.3954 (2.3977)	model_time 2.3951 (2.3950)	loss 1.6588 (2.8094)	grad_norm 0.4685 (0.4873/0.0330)	mem 48464MB
[2023-11-10 04:10:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][660/1251]	eta 0:23:37 lr 0.083501	time 2.3907 (2.3976)	model_time 2.3902 (2.3950)	loss 2.6042 (2.8099)	grad_norm 0.4522 (0.4877/0.0330)	mem 48464MB
[2023-11-10 04:11:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][670/1251]	eta 0:23:12 lr 0.083254	time 2.3923 (2.3976)	model_time 2.3920 (2.3950)	loss 2.3140 (2.8080)	grad_norm 0.5245 (0.4880/0.0331)	mem 48464MB
[2023-11-10 04:11:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][680/1251]	eta 0:22:48 lr 0.083006	time 2.3919 (2.3975)	model_time 2.3914 (2.3949)	loss 3.7446 (2.8122)	grad_norm 0.4880 (0.4887/0.0338)	mem 48464MB
[2023-11-10 04:11:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][690/1251]	eta 0:22:24 lr 0.082759	time 2.3928 (2.3975)	model_time 2.3924 (2.3949)	loss 3.0106 (2.8086)	grad_norm 0.5072 (0.4882/0.0333)	mem 48464MB
[2023-11-10 04:12:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][700/1251]	eta 0:22:00 lr 0.082512	time 2.3946 (2.3974)	model_time 2.3942 (2.3949)	loss 2.9900 (2.8040)	grad_norm 0.4524 (0.4877/0.0329)	mem 48464MB
[2023-11-10 04:12:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][710/1251]	eta 0:21:36 lr 0.082264	time 2.4084 (2.3974)	model_time 2.4077 (2.3949)	loss 2.4747 (2.8050)	grad_norm 0.4691 (0.4880/0.0333)	mem 48464MB
[2023-11-10 04:13:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][720/1251]	eta 0:21:12 lr 0.082017	time 2.3899 (2.3974)	model_time 2.3895 (2.3949)	loss 3.2054 (2.8034)	grad_norm 0.4791 (0.4882/0.0344)	mem 48464MB
[2023-11-10 04:13:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][730/1251]	eta 0:20:49 lr 0.081770	time 2.3945 (2.3973)	model_time 2.3942 (2.3949)	loss 2.4031 (2.8038)	grad_norm 0.4487 (0.4885/0.0345)	mem 48464MB
[2023-11-10 04:13:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][740/1251]	eta 0:20:25 lr 0.081523	time 2.3942 (2.3973)	model_time 2.3940 (2.3949)	loss 2.2513 (2.8024)	grad_norm 0.4899 (0.4883/0.0347)	mem 48464MB
[2023-11-10 04:14:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][750/1251]	eta 0:20:01 lr 0.081277	time 2.3953 (2.3973)	model_time 2.3949 (2.3949)	loss 2.0702 (2.7963)	grad_norm 0.4494 (0.4884/0.0348)	mem 48464MB
[2023-11-10 04:14:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][760/1251]	eta 0:19:37 lr 0.081030	time 2.3914 (2.3973)	model_time 2.3912 (2.3949)	loss 3.1500 (2.7968)	grad_norm 0.4934 (0.4883/0.0345)	mem 48464MB
[2023-11-10 04:15:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][770/1251]	eta 0:19:13 lr 0.080784	time 2.3883 (2.3972)	model_time 2.3879 (2.3948)	loss 3.0325 (2.7958)	grad_norm 0.4961 (0.4879/0.0345)	mem 48464MB
[2023-11-10 04:15:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][780/1251]	eta 0:18:49 lr 0.080537	time 2.3924 (2.3972)	model_time 2.3920 (2.3948)	loss 3.3921 (2.7976)	grad_norm 0.4813 (0.4878/0.0345)	mem 48464MB
[2023-11-10 04:15:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][790/1251]	eta 0:18:25 lr 0.080291	time 2.3914 (2.3971)	model_time 2.3907 (2.3948)	loss 3.0794 (2.7939)	grad_norm 0.5396 (0.4886/0.0347)	mem 48464MB
[2023-11-10 04:16:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][800/1251]	eta 0:18:01 lr 0.080045	time 2.3923 (2.3971)	model_time 2.3918 (2.3948)	loss 2.9493 (2.7916)	grad_norm 0.5366 (0.4883/0.0350)	mem 48464MB
[2023-11-10 04:16:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][810/1251]	eta 0:17:37 lr 0.079799	time 2.3975 (2.3971)	model_time 2.3972 (2.3948)	loss 3.0233 (2.7934)	grad_norm 0.4851 (0.4887/0.0354)	mem 48464MB
[2023-11-10 04:17:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][820/1251]	eta 0:17:13 lr 0.079553	time 2.3922 (2.3970)	model_time 2.3919 (2.3947)	loss 2.0599 (2.7910)	grad_norm 0.4791 (0.4881/0.0347)	mem 48464MB
[2023-11-10 04:17:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][830/1251]	eta 0:16:49 lr 0.079307	time 2.3905 (2.3970)	model_time 2.3901 (2.3947)	loss 2.0489 (2.7876)	grad_norm 0.4612 (0.4878/0.0351)	mem 48464MB
[2023-11-10 04:17:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][840/1251]	eta 0:16:25 lr 0.079061	time 2.3979 (2.3969)	model_time 2.3975 (2.3947)	loss 2.6655 (2.7891)	grad_norm 0.4448 (0.4883/0.0354)	mem 48464MB
[2023-11-10 04:18:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][850/1251]	eta 0:16:01 lr 0.078816	time 2.3922 (2.3969)	model_time 2.3919 (2.3947)	loss 2.6249 (2.7895)	grad_norm 0.4928 (0.4882/0.0354)	mem 48464MB
[2023-11-10 04:18:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][860/1251]	eta 0:15:37 lr 0.078571	time 2.3918 (2.3971)	model_time 2.3914 (2.3949)	loss 2.9056 (2.7878)	grad_norm 0.4836 (0.4884/0.0351)	mem 48464MB
[2023-11-10 04:19:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][870/1251]	eta 0:15:13 lr 0.078325	time 2.3936 (2.3971)	model_time 2.3933 (2.3949)	loss 3.6589 (2.7888)	grad_norm 0.5368 (0.4883/0.0352)	mem 48464MB
[2023-11-10 04:19:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][880/1251]	eta 0:14:49 lr 0.078080	time 2.3981 (2.3971)	model_time 2.3973 (2.3949)	loss 2.6711 (2.7891)	grad_norm 0.5144 (0.4887/0.0348)	mem 48464MB
[2023-11-10 04:19:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][890/1251]	eta 0:14:25 lr 0.077835	time 2.4019 (2.3970)	model_time 2.4013 (2.3949)	loss 3.6543 (2.7887)	grad_norm 0.5768 (0.4887/0.0349)	mem 48464MB
[2023-11-10 04:20:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][900/1251]	eta 0:14:01 lr 0.077591	time 2.3977 (2.3970)	model_time 2.3970 (2.3949)	loss 3.6120 (2.7942)	grad_norm 0.4848 (0.4896/0.0350)	mem 48464MB
[2023-11-10 04:20:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][910/1251]	eta 0:13:37 lr 0.077346	time 2.3974 (2.3970)	model_time 2.3971 (2.3949)	loss 2.2152 (2.7964)	grad_norm 0.5050 (0.4907/0.0359)	mem 48464MB
[2023-11-10 04:21:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][920/1251]	eta 0:13:13 lr 0.077101	time 2.3915 (2.3970)	model_time 2.3912 (2.3949)	loss 3.6252 (2.7985)	grad_norm 0.5156 (0.4899/0.0352)	mem 48464MB
[2023-11-10 04:21:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][930/1251]	eta 0:12:49 lr 0.076857	time 2.3930 (2.3969)	model_time 2.3927 (2.3949)	loss 2.7364 (2.7949)	grad_norm 0.4637 (0.4888/0.0350)	mem 48464MB
[2023-11-10 04:21:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][940/1251]	eta 0:12:25 lr 0.076613	time 2.3935 (2.3969)	model_time 2.3932 (2.3948)	loss 3.6157 (2.7961)	grad_norm 0.4978 (0.4891/0.0346)	mem 48464MB
[2023-11-10 04:22:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][950/1251]	eta 0:12:01 lr 0.076369	time 2.3908 (2.3969)	model_time 2.3906 (2.3948)	loss 2.0320 (2.7959)	grad_norm 0.4944 (0.4896/0.0348)	mem 48464MB
[2023-11-10 04:22:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][960/1251]	eta 0:11:37 lr 0.076125	time 2.3977 (2.3968)	model_time 2.3974 (2.3948)	loss 3.3188 (2.7957)	grad_norm 0.5089 (0.4894/0.0350)	mem 48464MB
[2023-11-10 04:23:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][970/1251]	eta 0:11:13 lr 0.075881	time 2.3954 (2.3968)	model_time 2.3952 (2.3948)	loss 3.2103 (2.7980)	grad_norm 0.5215 (0.4897/0.0350)	mem 48464MB
[2023-11-10 04:23:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][980/1251]	eta 0:10:49 lr 0.075637	time 2.3953 (2.3968)	model_time 2.3950 (2.3948)	loss 2.4961 (2.7984)	grad_norm 0.4917 (0.4892/0.0341)	mem 48464MB
[2023-11-10 04:23:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][990/1251]	eta 0:10:25 lr 0.075394	time 2.3964 (2.3968)	model_time 2.3961 (2.3948)	loss 2.3703 (2.7998)	grad_norm 0.4420 (0.4891/0.0341)	mem 48464MB
[2023-11-10 04:24:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1000/1251]	eta 0:10:01 lr 0.075150	time 2.3989 (2.3968)	model_time 2.3986 (2.3948)	loss 2.7397 (2.7992)	grad_norm 0.4801 (0.4895/0.0343)	mem 48464MB
[2023-11-10 04:24:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1010/1251]	eta 0:09:37 lr 0.074907	time 2.3973 (2.3967)	model_time 2.3970 (2.3948)	loss 3.1320 (2.8029)	grad_norm 0.4839 (0.4891/0.0341)	mem 48464MB
[2023-11-10 04:25:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1020/1251]	eta 0:09:13 lr 0.074664	time 2.3913 (2.3967)	model_time 2.3910 (2.3947)	loss 2.0112 (2.8035)	grad_norm 0.5050 (0.4892/0.0328)	mem 48464MB
[2023-11-10 04:25:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1030/1251]	eta 0:08:49 lr 0.074421	time 2.3945 (2.3968)	model_time 2.3941 (2.3948)	loss 2.8334 (2.7998)	grad_norm 0.4757 (0.4886/0.0328)	mem 48464MB
[2023-11-10 04:25:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1040/1251]	eta 0:08:25 lr 0.074179	time 2.3947 (2.3968)	model_time 2.3942 (2.3948)	loss 3.1351 (2.7999)	grad_norm 0.4641 (0.4882/0.0325)	mem 48464MB
[2023-11-10 04:26:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1050/1251]	eta 0:08:01 lr 0.073936	time 2.3934 (2.3969)	model_time 2.3932 (2.3949)	loss 2.3054 (2.8009)	grad_norm 0.4535 (0.4887/0.0325)	mem 48464MB
[2023-11-10 04:26:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1060/1251]	eta 0:07:37 lr 0.073694	time 2.3929 (2.3968)	model_time 2.3923 (2.3949)	loss 2.3662 (2.8018)	grad_norm 0.4776 (0.4886/0.0328)	mem 48464MB
[2023-11-10 04:27:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1070/1251]	eta 0:07:13 lr 0.073452	time 2.3928 (2.3968)	model_time 2.3925 (2.3949)	loss 1.7288 (2.8009)	grad_norm 0.4613 (0.4888/0.0332)	mem 48464MB
[2023-11-10 04:27:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1080/1251]	eta 0:06:49 lr 0.073210	time 2.3944 (2.3968)	model_time 2.3941 (2.3949)	loss 2.8927 (2.7993)	grad_norm 0.5022 (0.4881/0.0336)	mem 48464MB
[2023-11-10 04:27:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1090/1251]	eta 0:06:25 lr 0.072968	time 2.3909 (2.3968)	model_time 2.3905 (2.3949)	loss 3.7211 (2.8026)	grad_norm 0.4937 (0.4875/0.0332)	mem 48464MB
[2023-11-10 04:28:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1100/1251]	eta 0:06:01 lr 0.072726	time 2.3937 (2.3967)	model_time 2.3933 (2.3949)	loss 2.9911 (2.8027)	grad_norm 0.4587 (0.4880/0.0328)	mem 48464MB
[2023-11-10 04:28:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1110/1251]	eta 0:05:37 lr 0.072485	time 2.3955 (2.3967)	model_time 2.3952 (2.3949)	loss 2.9680 (2.8033)	grad_norm 0.5214 (0.4876/0.0325)	mem 48464MB
[2023-11-10 04:29:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1120/1251]	eta 0:05:13 lr 0.072243	time 2.3992 (2.3967)	model_time 2.3989 (2.3949)	loss 3.5623 (2.8035)	grad_norm 0.5328 (0.4874/0.0326)	mem 48464MB
[2023-11-10 04:29:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1130/1251]	eta 0:04:49 lr 0.072002	time 2.3928 (2.3967)	model_time 2.3924 (2.3949)	loss 3.2683 (2.8034)	grad_norm 0.4995 (0.4877/0.0323)	mem 48464MB
[2023-11-10 04:29:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1140/1251]	eta 0:04:26 lr 0.071761	time 2.3963 (2.3966)	model_time 2.3957 (2.3948)	loss 1.7661 (2.8031)	grad_norm 0.4917 (0.4873/0.0322)	mem 48464MB
[2023-11-10 04:30:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1150/1251]	eta 0:04:02 lr 0.071520	time 2.4054 (2.3967)	model_time 2.4047 (2.3949)	loss 2.3873 (2.7997)	grad_norm 0.4870 (0.4871/0.0323)	mem 48464MB
[2023-11-10 04:30:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1160/1251]	eta 0:03:38 lr 0.071280	time 2.3919 (2.3967)	model_time 2.3916 (2.3949)	loss 3.2721 (2.7997)	grad_norm 0.4817 (0.4882/0.0326)	mem 48464MB
[2023-11-10 04:31:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1170/1251]	eta 0:03:14 lr 0.071039	time 2.3949 (2.3966)	model_time 2.3946 (2.3949)	loss 2.8497 (2.8010)	grad_norm 0.5202 (0.4882/0.0324)	mem 48464MB
[2023-11-10 04:31:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1180/1251]	eta 0:02:50 lr 0.070799	time 2.3922 (2.3966)	model_time 2.3918 (2.3949)	loss 2.2100 (2.7995)	grad_norm 0.5003 (0.4881/0.0330)	mem 48464MB
[2023-11-10 04:31:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1190/1251]	eta 0:02:26 lr 0.070559	time 2.3940 (2.3966)	model_time 2.3937 (2.3948)	loss 2.8632 (2.7971)	grad_norm 0.4459 (0.4882/0.0325)	mem 48464MB
[2023-11-10 04:32:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1200/1251]	eta 0:02:02 lr 0.070319	time 2.3929 (2.3966)	model_time 2.3926 (2.3948)	loss 2.9020 (2.7982)	grad_norm 0.5256 (0.4880/0.0326)	mem 48464MB
[2023-11-10 04:32:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1210/1251]	eta 0:01:38 lr 0.070079	time 2.3955 (2.3966)	model_time 2.3951 (2.3948)	loss 2.5876 (2.7990)	grad_norm 0.4855 (0.4873/0.0310)	mem 48464MB
[2023-11-10 04:33:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1220/1251]	eta 0:01:14 lr 0.069840	time 2.3979 (2.3965)	model_time 2.3953 (2.3948)	loss 2.2472 (2.7997)	grad_norm 0.4280 (0.4866/0.0312)	mem 48464MB
[2023-11-10 04:33:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1230/1251]	eta 0:00:50 lr 0.069600	time 2.3912 (2.3965)	model_time 2.3906 (2.3948)	loss 2.5813 (2.8004)	grad_norm 0.4699 (0.4866/0.0314)	mem 48464MB
[2023-11-10 04:33:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1240/1251]	eta 0:00:26 lr 0.069361	time 2.3942 (2.3965)	model_time 2.3941 (2.3948)	loss 2.0650 (2.7986)	grad_norm 0.4773 (0.4872/0.0316)	mem 48464MB
[2023-11-10 04:34:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [5/10][1250/1251]	eta 0:00:02 lr 0.069122	time 2.3894 (2.3965)	model_time 2.3893 (2.3948)	loss 2.6680 (2.7983)	grad_norm 0.5324 (0.4874/0.0317)	mem 48464MB
[2023-11-10 04:34:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 5 training takes 0:49:58
[2023-11-10 04:34:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_5.pth saving......
[2023-11-10 04:36:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_5.pth saved !!!
[2023-11-10 04:36:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.771 (3.771)	Loss 0.6060 (0.6060)	Acc@1 88.086 (88.086)	Acc@5 98.535 (98.535)	Mem 48464MB
[2023-11-10 04:36:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.246 (2.376)	Loss 0.7007 (0.6112)	Acc@1 85.645 (88.068)	Acc@5 97.656 (98.384)	Mem 48464MB
[2023-11-10 04:36:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.251 (2.316)	Loss 0.5347 (0.6127)	Acc@1 90.137 (87.886)	Acc@5 98.730 (98.414)	Mem 48464MB
[2023-11-10 04:37:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.251 (2.296)	Loss 0.6699 (0.6171)	Acc@1 86.621 (87.780)	Acc@5 97.656 (98.403)	Mem 48464MB
[2023-11-10 04:37:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.252 (2.285)	Loss 0.6641 (0.6202)	Acc@1 87.109 (87.769)	Acc@5 97.559 (98.371)	Mem 48464MB
[2023-11-10 04:38:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:5] * Acc@1 87.818 Acc@5 98.388
[2023-11-10 04:38:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 87.8%
[2023-11-10 04:38:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-10 04:39:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-10 04:39:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 87.82%
[2023-11-10 04:39:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.809 (3.809)	Loss 0.5972 (0.5972)	Acc@1 87.695 (87.695)	Acc@5 98.633 (98.633)	Mem 48464MB
[2023-11-10 04:40:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.242 (2.379)	Loss 0.6899 (0.5987)	Acc@1 86.523 (88.148)	Acc@5 97.754 (98.402)	Mem 48464MB
[2023-11-10 04:40:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.251 (2.318)	Loss 0.5293 (0.5989)	Acc@1 90.039 (88.221)	Acc@5 98.730 (98.438)	Mem 48464MB
[2023-11-10 04:40:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.251 (2.296)	Loss 0.6543 (0.6031)	Acc@1 86.719 (88.067)	Acc@5 98.047 (98.456)	Mem 48464MB
[2023-11-10 04:41:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.252 (2.285)	Loss 0.6553 (0.6051)	Acc@1 87.012 (88.055)	Acc@5 97.754 (98.445)	Mem 48464MB
[2023-11-10 04:41:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:5] * Acc@1 88.100 Acc@5 98.452
[2023-11-10 04:41:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 88.1%
[2023-11-10 04:41:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saving......
[2023-11-10 04:43:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saved !!!
[2023-11-10 04:43:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 88.10%
[2023-11-10 04:43:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][0/1251]	eta 1:28:56 lr 0.069098	time 4.2654 (4.2654)	model_time 2.3940 (2.3940)	loss 2.0827 (2.0827)	grad_norm 0.5041 (0.5041/0.0000)	mem 48464MB
[2023-11-10 04:43:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][10/1251]	eta 0:52:49 lr 0.068860	time 2.3879 (2.5538)	model_time 2.3875 (2.3833)	loss 2.5758 (2.6284)	grad_norm 0.4387 (0.4820/0.0294)	mem 48464MB
[2023-11-10 04:44:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][20/1251]	eta 0:50:48 lr 0.068621	time 2.3944 (2.4765)	model_time 2.3940 (2.3869)	loss 2.6452 (2.6972)	grad_norm 0.4361 (0.4775/0.0247)	mem 48464MB
[2023-11-10 04:44:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][30/1251]	eta 0:49:49 lr 0.068383	time 2.3900 (2.4488)	model_time 2.3897 (2.3880)	loss 2.5056 (2.6746)	grad_norm 0.5005 (0.4758/0.0261)	mem 48464MB
[2023-11-10 04:44:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][40/1251]	eta 0:49:09 lr 0.068145	time 2.3921 (2.4352)	model_time 2.3918 (2.3891)	loss 2.6256 (2.7356)	grad_norm 0.4430 (0.4779/0.0311)	mem 48464MB
[2023-11-10 04:45:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][50/1251]	eta 0:48:35 lr 0.067907	time 2.3920 (2.4273)	model_time 2.3916 (2.3901)	loss 1.8313 (2.7014)	grad_norm 0.5370 (0.4804/0.0311)	mem 48464MB
[2023-11-10 04:45:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][60/1251]	eta 0:48:04 lr 0.067669	time 2.3899 (2.4215)	model_time 2.3895 (2.3904)	loss 2.8069 (2.6398)	grad_norm 0.5085 (0.4796/0.0302)	mem 48464MB
[2023-11-10 04:46:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][70/1251]	eta 0:47:35 lr 0.067431	time 2.3917 (2.4177)	model_time 2.3913 (2.3909)	loss 2.7982 (2.6674)	grad_norm 0.4521 (0.4781/0.0292)	mem 48464MB
[2023-11-10 04:46:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][80/1251]	eta 0:47:08 lr 0.067194	time 2.3941 (2.4150)	model_time 2.3937 (2.3915)	loss 3.0983 (2.7108)	grad_norm 0.5000 (0.4805/0.0307)	mem 48464MB
[2023-11-10 04:46:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][90/1251]	eta 0:46:41 lr 0.066957	time 2.4014 (2.4130)	model_time 2.4011 (2.3920)	loss 3.4989 (2.7119)	grad_norm 0.5214 (0.4805/0.0307)	mem 48464MB
[2023-11-10 04:47:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][100/1251]	eta 0:46:15 lr 0.066720	time 2.3923 (2.4114)	model_time 2.3919 (2.3924)	loss 2.9143 (2.7245)	grad_norm 0.4804 (0.4810/0.0304)	mem 48464MB
[2023-11-10 04:47:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][110/1251]	eta 0:45:49 lr 0.066483	time 2.3960 (2.4100)	model_time 2.3956 (2.3927)	loss 3.7545 (2.7113)	grad_norm 0.5019 (0.4823/0.0316)	mem 48464MB
[2023-11-10 04:48:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][120/1251]	eta 0:45:24 lr 0.066247	time 2.3929 (2.4087)	model_time 2.3926 (2.3927)	loss 2.8209 (2.7200)	grad_norm 0.5114 (0.4840/0.0323)	mem 48464MB
[2023-11-10 04:48:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][130/1251]	eta 0:45:00 lr 0.066010	time 2.3912 (2.4088)	model_time 2.3908 (2.3940)	loss 2.3880 (2.7369)	grad_norm 0.4553 (0.4834/0.0332)	mem 48464MB
[2023-11-10 04:48:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][140/1251]	eta 0:44:36 lr 0.065774	time 2.3942 (2.4090)	model_time 2.3939 (2.3953)	loss 3.6582 (2.7362)	grad_norm 0.5383 (0.4820/0.0334)	mem 48464MB
[2023-11-10 04:49:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][150/1251]	eta 0:44:11 lr 0.065539	time 2.3937 (2.4080)	model_time 2.3933 (2.3952)	loss 3.0996 (2.7415)	grad_norm 0.4610 (0.4831/0.0345)	mem 48464MB
[2023-11-10 04:49:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][160/1251]	eta 0:43:46 lr 0.065303	time 2.3939 (2.4074)	model_time 2.3935 (2.3953)	loss 2.8849 (2.7634)	grad_norm 0.4636 (0.4832/0.0349)	mem 48464MB
[2023-11-10 04:50:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][170/1251]	eta 0:43:21 lr 0.065067	time 2.3913 (2.4065)	model_time 2.3911 (2.3951)	loss 2.7212 (2.7536)	grad_norm 0.5000 (0.4835/0.0346)	mem 48464MB
[2023-11-10 04:50:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][180/1251]	eta 0:42:56 lr 0.064832	time 2.3918 (2.4058)	model_time 2.3916 (2.3950)	loss 2.3815 (2.7492)	grad_norm 0.4705 (0.4829/0.0341)	mem 48464MB
[2023-11-10 04:50:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][190/1251]	eta 0:42:31 lr 0.064597	time 2.3924 (2.4050)	model_time 2.3921 (2.3948)	loss 2.9390 (2.7608)	grad_norm 0.4790 (0.4827/0.0338)	mem 48464MB
[2023-11-10 04:51:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][200/1251]	eta 0:42:07 lr 0.064363	time 2.3983 (2.4045)	model_time 2.3981 (2.3947)	loss 2.5995 (2.7735)	grad_norm 0.4890 (0.4822/0.0333)	mem 48464MB
[2023-11-10 04:51:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][210/1251]	eta 0:41:42 lr 0.064128	time 2.3969 (2.4040)	model_time 2.3966 (2.3947)	loss 3.4675 (2.7702)	grad_norm 0.5088 (0.4824/0.0337)	mem 48464MB
[2023-11-10 04:52:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][220/1251]	eta 0:41:18 lr 0.063894	time 2.3943 (2.4037)	model_time 2.3940 (2.3947)	loss 2.8497 (2.7455)	grad_norm 0.5588 (0.4831/0.0336)	mem 48464MB
[2023-11-10 04:52:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][230/1251]	eta 0:40:53 lr 0.063660	time 2.3983 (2.4033)	model_time 2.3980 (2.3948)	loss 3.4675 (2.7518)	grad_norm 0.4981 (0.4837/0.0339)	mem 48464MB
[2023-11-10 04:52:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][240/1251]	eta 0:40:29 lr 0.063426	time 2.3930 (2.4029)	model_time 2.3927 (2.3947)	loss 3.5608 (2.7517)	grad_norm 0.5121 (0.4843/0.0339)	mem 48464MB
[2023-11-10 04:53:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][250/1251]	eta 0:40:04 lr 0.063192	time 2.3921 (2.4025)	model_time 2.3917 (2.3946)	loss 2.8388 (2.7522)	grad_norm 0.5280 (0.4846/0.0336)	mem 48464MB
[2023-11-10 04:53:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][260/1251]	eta 0:39:40 lr 0.062959	time 2.3930 (2.4023)	model_time 2.3927 (2.3946)	loss 3.0334 (2.7513)	grad_norm 0.4750 (0.4846/0.0335)	mem 48464MB
[2023-11-10 04:54:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][270/1251]	eta 0:39:16 lr 0.062726	time 2.3951 (2.4021)	model_time 2.3948 (2.3948)	loss 3.3897 (2.7525)	grad_norm 0.5212 (0.4847/0.0338)	mem 48464MB
[2023-11-10 04:54:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][280/1251]	eta 0:38:52 lr 0.062493	time 2.3947 (2.4019)	model_time 2.3943 (2.3948)	loss 3.4649 (2.7516)	grad_norm 0.5864 (0.4851/0.0342)	mem 48464MB
[2023-11-10 04:54:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][290/1251]	eta 0:38:28 lr 0.062260	time 2.3956 (2.4017)	model_time 2.3953 (2.3948)	loss 2.7894 (2.7564)	grad_norm 0.4525 (0.4849/0.0340)	mem 48464MB
[2023-11-10 04:55:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][300/1251]	eta 0:38:03 lr 0.062028	time 2.3954 (2.4015)	model_time 2.3950 (2.3948)	loss 3.1183 (2.7549)	grad_norm 0.4661 (0.4851/0.0339)	mem 48464MB
[2023-11-10 04:55:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][310/1251]	eta 0:37:39 lr 0.061795	time 2.3947 (2.4013)	model_time 2.3942 (2.3948)	loss 3.2028 (2.7562)	grad_norm 0.4857 (0.4856/0.0341)	mem 48464MB
[2023-11-10 04:56:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][320/1251]	eta 0:37:15 lr 0.061564	time 2.3931 (2.4010)	model_time 2.3928 (2.3947)	loss 2.2379 (2.7500)	grad_norm 0.4355 (0.4858/0.0345)	mem 48464MB
[2023-11-10 04:56:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][330/1251]	eta 0:36:51 lr 0.061332	time 2.3945 (2.4008)	model_time 2.3940 (2.3947)	loss 3.7954 (2.7589)	grad_norm 0.4468 (0.4861/0.0347)	mem 48464MB
[2023-11-10 04:56:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][340/1251]	eta 0:36:27 lr 0.061100	time 2.3991 (2.4007)	model_time 2.3988 (2.3947)	loss 3.3220 (2.7599)	grad_norm 0.4595 (0.4859/0.0347)	mem 48464MB
[2023-11-10 04:57:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][350/1251]	eta 0:36:02 lr 0.060869	time 2.4002 (2.4006)	model_time 2.4000 (2.3948)	loss 2.9401 (2.7538)	grad_norm 0.5252 (0.4859/0.0347)	mem 48464MB
[2023-11-10 04:57:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][360/1251]	eta 0:35:39 lr 0.060638	time 2.3970 (2.4008)	model_time 2.3967 (2.3951)	loss 3.0329 (2.7620)	grad_norm 0.4865 (0.4863/0.0345)	mem 48464MB
[2023-11-10 04:58:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][370/1251]	eta 0:35:14 lr 0.060407	time 2.3923 (2.4006)	model_time 2.3920 (2.3951)	loss 2.8110 (2.7687)	grad_norm 0.4888 (0.4874/0.0346)	mem 48464MB
[2023-11-10 04:58:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][380/1251]	eta 0:34:50 lr 0.060177	time 2.3979 (2.4005)	model_time 2.3976 (2.3951)	loss 2.9432 (2.7651)	grad_norm 0.5648 (0.4873/0.0346)	mem 48464MB
[2023-11-10 04:58:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][390/1251]	eta 0:34:26 lr 0.059947	time 2.3958 (2.4004)	model_time 2.3953 (2.3951)	loss 2.4391 (2.7683)	grad_norm 0.4503 (0.4872/0.0345)	mem 48464MB
[2023-11-10 04:59:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][400/1251]	eta 0:34:02 lr 0.059717	time 2.3930 (2.4002)	model_time 2.3926 (2.3951)	loss 3.4906 (2.7747)	grad_norm 0.4647 (0.4875/0.0346)	mem 48464MB
[2023-11-10 04:59:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][410/1251]	eta 0:33:38 lr 0.059487	time 2.3939 (2.4001)	model_time 2.3937 (2.3951)	loss 2.0554 (2.7706)	grad_norm 0.4610 (0.4868/0.0347)	mem 48464MB
[2023-11-10 05:00:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][420/1251]	eta 0:33:14 lr 0.059258	time 2.4000 (2.4000)	model_time 2.3994 (2.3951)	loss 3.5027 (2.7699)	grad_norm 0.4897 (0.4857/0.0345)	mem 48464MB
[2023-11-10 05:00:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][430/1251]	eta 0:32:50 lr 0.059028	time 2.3916 (2.3998)	model_time 2.3910 (2.3950)	loss 2.8981 (2.7680)	grad_norm 0.4953 (0.4863/0.0339)	mem 48464MB
[2023-11-10 05:00:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][440/1251]	eta 0:32:26 lr 0.058799	time 2.3933 (2.4000)	model_time 2.3929 (2.3953)	loss 2.7251 (2.7720)	grad_norm 0.4862 (0.4872/0.0339)	mem 48464MB
[2023-11-10 05:01:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][450/1251]	eta 0:32:02 lr 0.058571	time 2.3968 (2.3999)	model_time 2.3964 (2.3953)	loss 3.1552 (2.7688)	grad_norm 0.4746 (0.4871/0.0335)	mem 48464MB
[2023-11-10 05:01:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][460/1251]	eta 0:31:38 lr 0.058342	time 2.3936 (2.3998)	model_time 2.3931 (2.3953)	loss 1.7490 (2.7638)	grad_norm 0.5216 (0.4875/0.0332)	mem 48464MB
[2023-11-10 05:02:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][470/1251]	eta 0:31:14 lr 0.058114	time 2.3950 (2.3997)	model_time 2.3945 (2.3952)	loss 2.6702 (2.7644)	grad_norm 0.4712 (0.4873/0.0335)	mem 48464MB
[2023-11-10 05:02:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][480/1251]	eta 0:30:50 lr 0.057886	time 2.3961 (2.3995)	model_time 2.3957 (2.3952)	loss 2.8203 (2.7623)	grad_norm 0.4529 (0.4877/0.0337)	mem 48464MB
[2023-11-10 05:02:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][490/1251]	eta 0:30:25 lr 0.057659	time 2.3944 (2.3995)	model_time 2.3941 (2.3952)	loss 2.1417 (2.7613)	grad_norm 0.4787 (0.4887/0.0342)	mem 48464MB
[2023-11-10 05:03:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][500/1251]	eta 0:30:01 lr 0.057431	time 2.4011 (2.3994)	model_time 2.4008 (2.3952)	loss 3.1316 (2.7633)	grad_norm 0.4673 (0.4888/0.0344)	mem 48464MB
[2023-11-10 05:03:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][510/1251]	eta 0:29:37 lr 0.057204	time 2.3983 (2.3993)	model_time 2.3980 (2.3952)	loss 2.6886 (2.7619)	grad_norm 0.4553 (0.4886/0.0340)	mem 48464MB
[2023-11-10 05:04:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][520/1251]	eta 0:29:13 lr 0.056977	time 2.3968 (2.3993)	model_time 2.3963 (2.3952)	loss 2.6281 (2.7574)	grad_norm 0.4716 (0.4887/0.0350)	mem 48464MB
[2023-11-10 05:04:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][530/1251]	eta 0:28:49 lr 0.056751	time 2.3919 (2.3992)	model_time 2.3914 (2.3952)	loss 3.0047 (2.7612)	grad_norm 0.5176 (0.4877/0.0349)	mem 48464MB
[2023-11-10 05:04:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][540/1251]	eta 0:28:25 lr 0.056524	time 2.3932 (2.3991)	model_time 2.3929 (2.3952)	loss 2.7256 (2.7576)	grad_norm 0.4732 (0.4877/0.0347)	mem 48464MB
[2023-11-10 05:05:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][550/1251]	eta 0:28:01 lr 0.056298	time 2.3938 (2.3990)	model_time 2.3935 (2.3952)	loss 1.8089 (2.7513)	grad_norm 0.4733 (0.4876/0.0350)	mem 48464MB
[2023-11-10 05:05:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][560/1251]	eta 0:27:37 lr 0.056073	time 2.3942 (2.3990)	model_time 2.3938 (2.3952)	loss 3.0428 (2.7516)	grad_norm 0.5195 (0.4877/0.0353)	mem 48464MB
[2023-11-10 05:06:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][570/1251]	eta 0:27:13 lr 0.055847	time 2.3956 (2.3989)	model_time 2.3952 (2.3952)	loss 3.6407 (2.7571)	grad_norm 0.4988 (0.4881/0.0348)	mem 48464MB
[2023-11-10 05:06:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][580/1251]	eta 0:26:49 lr 0.055622	time 2.3976 (2.3989)	model_time 2.3971 (2.3952)	loss 2.8786 (2.7568)	grad_norm 0.4707 (0.4878/0.0344)	mem 48464MB
[2023-11-10 05:06:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][590/1251]	eta 0:26:25 lr 0.055397	time 2.3958 (2.3988)	model_time 2.3955 (2.3952)	loss 3.3008 (2.7650)	grad_norm 0.4959 (0.4886/0.0353)	mem 48464MB
[2023-11-10 05:07:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][600/1251]	eta 0:26:01 lr 0.055172	time 2.3920 (2.3987)	model_time 2.3917 (2.3952)	loss 2.9177 (2.7608)	grad_norm 0.4792 (0.4883/0.0353)	mem 48464MB
[2023-11-10 05:07:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][610/1251]	eta 0:25:37 lr 0.054948	time 2.3958 (2.3988)	model_time 2.3956 (2.3953)	loss 2.2404 (2.7579)	grad_norm 0.4738 (0.4875/0.0350)	mem 48464MB
[2023-11-10 05:08:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][620/1251]	eta 0:25:13 lr 0.054724	time 2.3954 (2.3988)	model_time 2.3951 (2.3953)	loss 3.7172 (2.7617)	grad_norm 0.4756 (0.4874/0.0347)	mem 48464MB
[2023-11-10 05:08:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][630/1251]	eta 0:24:49 lr 0.054500	time 2.3929 (2.3988)	model_time 2.3924 (2.3953)	loss 2.7571 (2.7582)	grad_norm 0.4655 (0.4871/0.0346)	mem 48464MB
[2023-11-10 05:08:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][640/1251]	eta 0:24:25 lr 0.054277	time 2.3963 (2.3987)	model_time 2.3960 (2.3953)	loss 2.1090 (2.7586)	grad_norm 0.5146 (0.4876/0.0343)	mem 48464MB
[2023-11-10 05:09:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][650/1251]	eta 0:24:01 lr 0.054054	time 2.3941 (2.3986)	model_time 2.3937 (2.3953)	loss 2.7761 (2.7591)	grad_norm 0.5436 (0.4876/0.0345)	mem 48464MB
[2023-11-10 05:09:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][660/1251]	eta 0:23:37 lr 0.053831	time 2.3955 (2.3985)	model_time 2.3953 (2.3952)	loss 3.2978 (2.7577)	grad_norm 0.4842 (0.4873/0.0348)	mem 48464MB
[2023-11-10 05:10:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][670/1251]	eta 0:23:13 lr 0.053608	time 2.3975 (2.3985)	model_time 2.3970 (2.3952)	loss 3.1615 (2.7599)	grad_norm 0.4750 (0.4867/0.0350)	mem 48464MB
[2023-11-10 05:10:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][680/1251]	eta 0:22:49 lr 0.053386	time 2.3928 (2.3984)	model_time 2.3924 (2.3952)	loss 3.5945 (2.7621)	grad_norm 0.5462 (0.4869/0.0350)	mem 48464MB
[2023-11-10 05:10:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][690/1251]	eta 0:22:25 lr 0.053164	time 2.3974 (2.3984)	model_time 2.3970 (2.3952)	loss 2.5848 (2.7595)	grad_norm 0.4785 (0.4871/0.0349)	mem 48464MB
[2023-11-10 05:11:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][700/1251]	eta 0:22:01 lr 0.052942	time 2.3960 (2.3983)	model_time 2.3955 (2.3952)	loss 3.5104 (2.7625)	grad_norm 0.4591 (0.4868/0.0348)	mem 48464MB
[2023-11-10 05:11:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][710/1251]	eta 0:21:37 lr 0.052721	time 2.3944 (2.3983)	model_time 2.3940 (2.3952)	loss 3.3888 (2.7614)	grad_norm 0.4929 (0.4870/0.0344)	mem 48464MB
[2023-11-10 05:12:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][720/1251]	eta 0:21:13 lr 0.052499	time 2.3921 (2.3982)	model_time 2.3918 (2.3952)	loss 1.9465 (2.7578)	grad_norm 0.5170 (0.4874/0.0341)	mem 48464MB
[2023-11-10 05:12:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][730/1251]	eta 0:20:49 lr 0.052279	time 2.3941 (2.3982)	model_time 2.3938 (2.3952)	loss 2.3950 (2.7556)	grad_norm 0.4674 (0.4865/0.0345)	mem 48464MB
[2023-11-10 05:12:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][740/1251]	eta 0:20:25 lr 0.052058	time 2.3968 (2.3982)	model_time 2.3965 (2.3952)	loss 3.0392 (2.7567)	grad_norm 0.4688 (0.4863/0.0342)	mem 48464MB
[2023-11-10 05:13:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][750/1251]	eta 0:20:01 lr 0.051838	time 2.3965 (2.3981)	model_time 2.3961 (2.3952)	loss 3.5804 (2.7589)	grad_norm 0.5209 (0.4861/0.0342)	mem 48464MB
[2023-11-10 05:13:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][760/1251]	eta 0:19:37 lr 0.051618	time 2.4019 (2.3981)	model_time 2.4013 (2.3952)	loss 3.3148 (2.7645)	grad_norm 0.5350 (0.4859/0.0344)	mem 48464MB
[2023-11-10 05:14:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][770/1251]	eta 0:19:13 lr 0.051398	time 2.3924 (2.3981)	model_time 2.3921 (2.3952)	loss 3.4456 (2.7651)	grad_norm 0.4493 (0.4859/0.0340)	mem 48464MB
[2023-11-10 05:14:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][780/1251]	eta 0:18:49 lr 0.051179	time 2.3929 (2.3981)	model_time 2.3926 (2.3952)	loss 2.7098 (2.7669)	grad_norm 0.5039 (0.4860/0.0339)	mem 48464MB
[2023-11-10 05:14:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][790/1251]	eta 0:18:25 lr 0.050960	time 2.3946 (2.3980)	model_time 2.3943 (2.3952)	loss 2.8547 (2.7684)	grad_norm 0.4527 (0.4850/0.0334)	mem 48464MB
[2023-11-10 05:15:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][800/1251]	eta 0:18:01 lr 0.050741	time 2.3909 (2.3980)	model_time 2.3905 (2.3952)	loss 2.0479 (2.7663)	grad_norm 0.4443 (0.4849/0.0332)	mem 48464MB
[2023-11-10 05:15:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][810/1251]	eta 0:17:37 lr 0.050523	time 2.3952 (2.3979)	model_time 2.3949 (2.3952)	loss 2.9650 (2.7648)	grad_norm 0.4662 (0.4848/0.0334)	mem 48464MB
[2023-11-10 05:16:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][820/1251]	eta 0:17:13 lr 0.050305	time 2.3924 (2.3979)	model_time 2.3921 (2.3952)	loss 1.7982 (2.7654)	grad_norm 0.4898 (0.4846/0.0326)	mem 48464MB
[2023-11-10 05:16:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][830/1251]	eta 0:16:49 lr 0.050087	time 2.3956 (2.3979)	model_time 2.3950 (2.3952)	loss 2.6797 (2.7672)	grad_norm 0.4422 (0.4852/0.0323)	mem 48464MB
[2023-11-10 05:16:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][840/1251]	eta 0:16:25 lr 0.049870	time 2.3907 (2.3979)	model_time 2.3903 (2.3952)	loss 2.9380 (2.7647)	grad_norm 0.5050 (0.4852/0.0326)	mem 48464MB
[2023-11-10 05:17:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][850/1251]	eta 0:16:01 lr 0.049652	time 2.4015 (2.3978)	model_time 2.4006 (2.3952)	loss 3.4520 (2.7664)	grad_norm 0.5322 (0.4859/0.0327)	mem 48464MB
[2023-11-10 05:17:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][860/1251]	eta 0:15:37 lr 0.049436	time 2.3938 (2.3978)	model_time 2.3935 (2.3952)	loss 2.0454 (2.7677)	grad_norm 0.4565 (0.4858/0.0327)	mem 48464MB
[2023-11-10 05:18:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][870/1251]	eta 0:15:13 lr 0.049219	time 2.3945 (2.3978)	model_time 2.3941 (2.3952)	loss 2.6007 (2.7691)	grad_norm 0.4860 (0.4851/0.0329)	mem 48464MB
[2023-11-10 05:18:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][880/1251]	eta 0:14:49 lr 0.049003	time 2.3920 (2.3978)	model_time 2.3916 (2.3952)	loss 2.9630 (2.7711)	grad_norm 0.4585 (0.4851/0.0327)	mem 48464MB
[2023-11-10 05:18:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][890/1251]	eta 0:14:25 lr 0.048787	time 2.5011 (2.3979)	model_time 2.5007 (2.3953)	loss 2.5649 (2.7742)	grad_norm 0.4690 (0.4848/0.0313)	mem 48464MB
[2023-11-10 05:19:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][900/1251]	eta 0:14:01 lr 0.048572	time 2.3951 (2.3978)	model_time 2.3948 (2.3953)	loss 2.6940 (2.7748)	grad_norm 0.4603 (0.4845/0.0315)	mem 48464MB
[2023-11-10 05:19:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][910/1251]	eta 0:13:37 lr 0.048356	time 2.3929 (2.3979)	model_time 2.3924 (2.3954)	loss 2.6148 (2.7725)	grad_norm 0.4643 (0.4851/0.0314)	mem 48464MB
[2023-11-10 05:20:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][920/1251]	eta 0:13:13 lr 0.048141	time 2.3983 (2.3979)	model_time 2.3981 (2.3954)	loss 2.9556 (2.7717)	grad_norm 0.4557 (0.4851/0.0316)	mem 48464MB
[2023-11-10 05:20:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][930/1251]	eta 0:12:49 lr 0.047927	time 2.3962 (2.3979)	model_time 2.3960 (2.3954)	loss 3.6257 (2.7715)	grad_norm 0.4787 (0.4853/0.0316)	mem 48464MB
[2023-11-10 05:20:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][940/1251]	eta 0:12:25 lr 0.047713	time 2.3907 (2.3979)	model_time 2.3904 (2.3954)	loss 2.3846 (2.7737)	grad_norm 0.5248 (0.4849/0.0317)	mem 48464MB
[2023-11-10 05:21:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][950/1251]	eta 0:12:01 lr 0.047499	time 2.3917 (2.3979)	model_time 2.3914 (2.3954)	loss 3.1986 (2.7767)	grad_norm 0.4429 (0.4846/0.0315)	mem 48464MB
[2023-11-10 05:21:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][960/1251]	eta 0:11:37 lr 0.047285	time 2.3954 (2.3978)	model_time 2.3950 (2.3954)	loss 2.9944 (2.7779)	grad_norm 0.4732 (0.4852/0.0314)	mem 48464MB
[2023-11-10 05:22:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][970/1251]	eta 0:11:13 lr 0.047072	time 2.3945 (2.3978)	model_time 2.3941 (2.3954)	loss 2.3357 (2.7752)	grad_norm 0.4463 (0.4852/0.0312)	mem 48464MB
[2023-11-10 05:22:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][980/1251]	eta 0:10:49 lr 0.046859	time 2.3927 (2.3978)	model_time 2.3924 (2.3954)	loss 2.9270 (2.7732)	grad_norm 0.4893 (0.4843/0.0308)	mem 48464MB
[2023-11-10 05:22:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][990/1251]	eta 0:10:25 lr 0.046647	time 2.3931 (2.3977)	model_time 2.3928 (2.3954)	loss 3.6853 (2.7779)	grad_norm 0.4758 (0.4841/0.0307)	mem 48464MB
[2023-11-10 05:23:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1000/1251]	eta 0:10:01 lr 0.046434	time 2.3935 (2.3977)	model_time 2.3931 (2.3954)	loss 3.6454 (2.7799)	grad_norm 0.4911 (0.4840/0.0310)	mem 48464MB
[2023-11-10 05:23:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1010/1251]	eta 0:09:37 lr 0.046222	time 2.3936 (2.3977)	model_time 2.3933 (2.3954)	loss 1.5971 (2.7794)	grad_norm 0.4645 (0.4842/0.0312)	mem 48464MB
[2023-11-10 05:24:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1020/1251]	eta 0:09:13 lr 0.046011	time 2.4021 (2.3977)	model_time 2.4017 (2.3954)	loss 1.6308 (2.7791)	grad_norm 0.4616 (0.4850/0.0319)	mem 48464MB
[2023-11-10 05:24:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1030/1251]	eta 0:08:49 lr 0.045800	time 2.3919 (2.3977)	model_time 2.3915 (2.3954)	loss 2.7441 (2.7805)	grad_norm 0.5201 (0.4861/0.0316)	mem 48464MB
[2023-11-10 05:24:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1040/1251]	eta 0:08:25 lr 0.045589	time 2.3950 (2.3976)	model_time 2.3947 (2.3954)	loss 2.4851 (2.7825)	grad_norm 0.5217 (0.4862/0.0314)	mem 48464MB
[2023-11-10 05:25:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1050/1251]	eta 0:08:01 lr 0.045378	time 2.3964 (2.3976)	model_time 2.3961 (2.3954)	loss 2.9750 (2.7856)	grad_norm 0.4569 (0.4859/0.0311)	mem 48464MB
[2023-11-10 05:25:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1060/1251]	eta 0:07:37 lr 0.045168	time 2.3946 (2.3976)	model_time 2.3943 (2.3954)	loss 2.7446 (2.7857)	grad_norm 0.5201 (0.4860/0.0313)	mem 48464MB
[2023-11-10 05:26:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1070/1251]	eta 0:07:13 lr 0.044958	time 2.3959 (2.3976)	model_time 2.3955 (2.3954)	loss 2.4176 (2.7842)	grad_norm 0.4519 (0.4857/0.0314)	mem 48464MB
[2023-11-10 05:26:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1080/1251]	eta 0:06:50 lr 0.044749	time 2.3994 (2.3977)	model_time 2.3988 (2.3955)	loss 3.5528 (2.7832)	grad_norm 0.4752 (0.4854/0.0315)	mem 48464MB
[2023-11-10 05:26:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1090/1251]	eta 0:06:26 lr 0.044540	time 2.3908 (2.3977)	model_time 2.3904 (2.3955)	loss 2.2764 (2.7860)	grad_norm 0.5184 (0.4856/0.0315)	mem 48464MB
[2023-11-10 05:27:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1100/1251]	eta 0:06:02 lr 0.044331	time 2.3943 (2.3976)	model_time 2.3940 (2.3955)	loss 3.2469 (2.7878)	grad_norm 0.5359 (0.4865/0.0315)	mem 48464MB
[2023-11-10 05:27:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1110/1251]	eta 0:05:38 lr 0.044122	time 2.3954 (2.3977)	model_time 2.3949 (2.3956)	loss 3.1343 (2.7881)	grad_norm 0.4816 (0.4871/0.0314)	mem 48464MB
[2023-11-10 05:28:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1120/1251]	eta 0:05:14 lr 0.043914	time 2.3955 (2.3977)	model_time 2.3950 (2.3955)	loss 2.5180 (2.7895)	grad_norm 0.4830 (0.4870/0.0313)	mem 48464MB
[2023-11-10 05:28:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1130/1251]	eta 0:04:50 lr 0.043707	time 2.3921 (2.3976)	model_time 2.3918 (2.3955)	loss 2.3188 (2.7873)	grad_norm 0.4825 (0.4870/0.0313)	mem 48464MB
[2023-11-10 05:28:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1140/1251]	eta 0:04:26 lr 0.043499	time 2.3938 (2.3976)	model_time 2.3935 (2.3955)	loss 1.5405 (2.7886)	grad_norm 0.4917 (0.4873/0.0309)	mem 48464MB
[2023-11-10 05:29:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1150/1251]	eta 0:04:02 lr 0.043292	time 2.3937 (2.3976)	model_time 2.3931 (2.3955)	loss 2.3470 (2.7871)	grad_norm 0.5068 (0.4866/0.0306)	mem 48464MB
[2023-11-10 05:29:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1160/1251]	eta 0:03:38 lr 0.043085	time 2.3958 (2.3976)	model_time 2.3953 (2.3955)	loss 3.7529 (2.7874)	grad_norm 0.4843 (0.4863/0.0300)	mem 48464MB
[2023-11-10 05:30:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1170/1251]	eta 0:03:14 lr 0.042879	time 2.3926 (2.3976)	model_time 2.3922 (2.3955)	loss 2.2138 (2.7859)	grad_norm 0.5057 (0.4864/0.0296)	mem 48464MB
[2023-11-10 05:30:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1180/1251]	eta 0:02:50 lr 0.042673	time 2.3926 (2.3975)	model_time 2.3923 (2.3955)	loss 2.7396 (2.7840)	grad_norm 0.5050 (0.4865/0.0303)	mem 48464MB
[2023-11-10 05:30:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1190/1251]	eta 0:02:26 lr 0.042468	time 2.3986 (2.3975)	model_time 2.3980 (2.3955)	loss 2.3677 (2.7823)	grad_norm 0.5191 (0.4871/0.0319)	mem 48464MB
[2023-11-10 05:31:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1200/1251]	eta 0:02:02 lr 0.042262	time 2.3959 (2.3975)	model_time 2.3956 (2.3955)	loss 2.5193 (2.7833)	grad_norm 0.4868 (0.4870/0.0317)	mem 48464MB
[2023-11-10 05:31:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1210/1251]	eta 0:01:38 lr 0.042058	time 2.3969 (2.3975)	model_time 2.3967 (2.3955)	loss 2.5720 (2.7832)	grad_norm 0.5011 (0.4869/0.0316)	mem 48464MB
[2023-11-10 05:32:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1220/1251]	eta 0:01:14 lr 0.041853	time 2.3959 (2.3975)	model_time 2.3949 (2.3954)	loss 1.7629 (2.7836)	grad_norm 0.4700 (0.4873/0.0313)	mem 48464MB
[2023-11-10 05:32:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1230/1251]	eta 0:00:50 lr 0.041649	time 2.3932 (2.3974)	model_time 2.3928 (2.3954)	loss 3.7375 (2.7862)	grad_norm 0.6035 (0.4880/0.0322)	mem 48464MB
[2023-11-10 05:32:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1240/1251]	eta 0:00:26 lr 0.041445	time 2.3933 (2.3974)	model_time 2.3931 (2.3954)	loss 2.7892 (2.7868)	grad_norm 0.4626 (0.4877/0.0321)	mem 48464MB
[2023-11-10 05:33:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [6/10][1250/1251]	eta 0:00:02 lr 0.041242	time 2.3929 (2.3974)	model_time 2.3928 (2.3954)	loss 3.0978 (2.7882)	grad_norm 0.5105 (0.4876/0.0327)	mem 48464MB
[2023-11-10 05:33:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 6 training takes 0:49:59
[2023-11-10 05:33:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_6.pth saving......
[2023-11-10 05:35:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_6.pth saved !!!
[2023-11-10 05:35:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.785 (3.785)	Loss 0.6021 (0.6021)	Acc@1 88.281 (88.281)	Acc@5 98.828 (98.828)	Mem 48464MB
[2023-11-10 05:35:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.247 (2.377)	Loss 0.6943 (0.6001)	Acc@1 86.621 (88.281)	Acc@5 97.852 (98.402)	Mem 48464MB
[2023-11-10 05:35:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.252 (2.317)	Loss 0.5347 (0.5999)	Acc@1 89.746 (88.216)	Acc@5 98.730 (98.414)	Mem 48464MB
[2023-11-10 05:36:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.296)	Loss 0.6470 (0.6043)	Acc@1 86.914 (88.064)	Acc@5 98.242 (98.434)	Mem 48464MB
[2023-11-10 05:36:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.252 (2.285)	Loss 0.6621 (0.6063)	Acc@1 87.012 (88.057)	Acc@5 97.754 (98.418)	Mem 48464MB
[2023-11-10 05:36:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:6] * Acc@1 88.076 Acc@5 98.424
[2023-11-10 05:36:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 88.1%
[2023-11-10 05:36:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-10 05:38:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-10 05:38:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 88.08%
[2023-11-10 05:38:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.777 (3.777)	Loss 0.5986 (0.5986)	Acc@1 88.770 (88.770)	Acc@5 98.633 (98.633)	Mem 48464MB
[2023-11-10 05:39:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.246 (2.376)	Loss 0.6938 (0.5994)	Acc@1 86.719 (88.343)	Acc@5 97.949 (98.411)	Mem 48464MB
[2023-11-10 05:39:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.254 (2.316)	Loss 0.5303 (0.5994)	Acc@1 89.941 (88.300)	Acc@5 98.730 (98.410)	Mem 48464MB
[2023-11-10 05:39:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.296)	Loss 0.6509 (0.6041)	Acc@1 86.621 (88.130)	Acc@5 98.242 (98.453)	Mem 48464MB
[2023-11-10 05:40:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.252 (2.285)	Loss 0.6621 (0.6063)	Acc@1 87.109 (88.138)	Acc@5 98.145 (98.445)	Mem 48464MB
[2023-11-10 05:40:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:6] * Acc@1 88.168 Acc@5 98.460
[2023-11-10 05:40:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 88.2%
[2023-11-10 05:40:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saving......
[2023-11-10 05:42:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saved !!!
[2023-11-10 05:42:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 88.17%
[2023-11-10 05:42:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][0/1251]	eta 1:19:55 lr 0.041221	time 3.8332 (3.8332)	model_time 2.3727 (2.3727)	loss 2.7833 (2.7833)	grad_norm 0.5078 (0.5078/0.0000)	mem 48464MB
[2023-11-10 05:42:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][10/1251]	eta 0:51:56 lr 0.041018	time 2.3898 (2.5114)	model_time 2.3893 (2.3782)	loss 3.6629 (2.9244)	grad_norm 0.4391 (0.4870/0.0358)	mem 48464MB
[2023-11-10 05:43:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][20/1251]	eta 0:50:21 lr 0.040816	time 2.3926 (2.4545)	model_time 2.3921 (2.3846)	loss 3.2144 (2.8571)	grad_norm 0.4734 (0.4805/0.0332)	mem 48464MB
[2023-11-10 05:43:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][30/1251]	eta 0:49:38 lr 0.040614	time 2.5241 (2.4391)	model_time 2.5239 (2.3916)	loss 2.9731 (2.8335)	grad_norm 0.4970 (0.4841/0.0325)	mem 48464MB
[2023-11-10 05:43:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][40/1251]	eta 0:49:00 lr 0.040412	time 2.3952 (2.4280)	model_time 2.3948 (2.3919)	loss 2.4362 (2.8274)	grad_norm 0.4310 (0.4847/0.0326)	mem 48464MB
[2023-11-10 05:44:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][50/1251]	eta 0:48:27 lr 0.040210	time 2.3940 (2.4213)	model_time 2.3936 (2.3921)	loss 2.5623 (2.7939)	grad_norm 0.4528 (0.4800/0.0327)	mem 48464MB
[2023-11-10 05:44:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][60/1251]	eta 0:47:58 lr 0.040009	time 2.3914 (2.4168)	model_time 2.3910 (2.3924)	loss 2.3859 (2.7882)	grad_norm 0.4449 (0.4826/0.0329)	mem 48464MB
[2023-11-10 05:45:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][70/1251]	eta 0:47:30 lr 0.039808	time 2.3981 (2.4137)	model_time 2.3974 (2.3927)	loss 3.0393 (2.8051)	grad_norm 0.4784 (0.4831/0.0323)	mem 48464MB
[2023-11-10 05:45:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][80/1251]	eta 0:47:03 lr 0.039608	time 2.3938 (2.4115)	model_time 2.3935 (2.3930)	loss 2.6047 (2.7952)	grad_norm 0.5415 (0.4855/0.0328)	mem 48464MB
[2023-11-10 05:45:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][90/1251]	eta 0:46:37 lr 0.039408	time 2.3954 (2.4097)	model_time 2.3950 (2.3931)	loss 1.5586 (2.7996)	grad_norm 0.4269 (0.4840/0.0333)	mem 48464MB
[2023-11-10 05:46:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][100/1251]	eta 0:46:11 lr 0.039209	time 2.3919 (2.4081)	model_time 2.3914 (2.3932)	loss 2.2420 (2.7996)	grad_norm 0.4240 (0.4834/0.0330)	mem 48464MB
[2023-11-10 05:46:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][110/1251]	eta 0:45:46 lr 0.039009	time 2.3941 (2.4069)	model_time 2.3938 (2.3933)	loss 2.5385 (2.7938)	grad_norm 0.4758 (0.4843/0.0334)	mem 48464MB
[2023-11-10 05:47:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][120/1251]	eta 0:45:21 lr 0.038811	time 2.3948 (2.4059)	model_time 2.3944 (2.3934)	loss 2.6161 (2.7912)	grad_norm 0.4523 (0.4843/0.0340)	mem 48464MB
[2023-11-10 05:47:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][130/1251]	eta 0:44:56 lr 0.038612	time 2.3968 (2.4051)	model_time 2.3956 (2.3935)	loss 2.9055 (2.7877)	grad_norm 0.4660 (0.4849/0.0332)	mem 48464MB
[2023-11-10 05:47:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][140/1251]	eta 0:44:31 lr 0.038414	time 2.3942 (2.4044)	model_time 2.3937 (2.3935)	loss 3.1366 (2.7734)	grad_norm 0.4813 (0.4841/0.0331)	mem 48464MB
[2023-11-10 05:48:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][150/1251]	eta 0:44:06 lr 0.038216	time 2.3942 (2.4038)	model_time 2.3939 (2.3936)	loss 3.3927 (2.7626)	grad_norm 0.5013 (0.4841/0.0332)	mem 48464MB
[2023-11-10 05:48:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][160/1251]	eta 0:43:42 lr 0.038019	time 2.3983 (2.4041)	model_time 2.3979 (2.3946)	loss 2.8557 (2.7729)	grad_norm 0.4614 (0.4840/0.0336)	mem 48464MB
[2023-11-10 05:49:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][170/1251]	eta 0:43:18 lr 0.037822	time 2.3960 (2.4036)	model_time 2.3955 (2.3946)	loss 2.6684 (2.7799)	grad_norm 0.4967 (0.4844/0.0337)	mem 48464MB
[2023-11-10 05:49:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][180/1251]	eta 0:42:53 lr 0.037626	time 2.3962 (2.4033)	model_time 2.3959 (2.3947)	loss 3.0585 (2.7940)	grad_norm 0.5211 (0.4843/0.0335)	mem 48464MB
[2023-11-10 05:49:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][190/1251]	eta 0:42:29 lr 0.037430	time 2.3989 (2.4029)	model_time 2.3985 (2.3948)	loss 3.3967 (2.7864)	grad_norm 0.5189 (0.4828/0.0344)	mem 48464MB
[2023-11-10 05:50:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][200/1251]	eta 0:42:05 lr 0.037234	time 2.3916 (2.4025)	model_time 2.3912 (2.3948)	loss 2.9154 (2.7828)	grad_norm 0.5656 (0.4820/0.0346)	mem 48464MB
[2023-11-10 05:50:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][210/1251]	eta 0:41:40 lr 0.037039	time 2.3958 (2.4022)	model_time 2.3954 (2.3948)	loss 3.5358 (2.8053)	grad_norm 0.4636 (0.4823/0.0345)	mem 48464MB
[2023-11-10 05:51:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][220/1251]	eta 0:41:16 lr 0.036844	time 2.3959 (2.4019)	model_time 2.3956 (2.3948)	loss 2.8221 (2.7921)	grad_norm 0.4486 (0.4817/0.0342)	mem 48464MB
[2023-11-10 05:51:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][230/1251]	eta 0:40:51 lr 0.036649	time 2.3941 (2.4016)	model_time 2.3938 (2.3947)	loss 3.3085 (2.7981)	grad_norm 0.4747 (0.4815/0.0337)	mem 48464MB
[2023-11-10 05:51:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][240/1251]	eta 0:40:27 lr 0.036455	time 2.3914 (2.4013)	model_time 2.3910 (2.3948)	loss 2.1124 (2.7897)	grad_norm 0.4674 (0.4805/0.0335)	mem 48464MB
[2023-11-10 05:52:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][250/1251]	eta 0:40:03 lr 0.036261	time 2.3974 (2.4011)	model_time 2.3971 (2.3948)	loss 2.1702 (2.7722)	grad_norm 0.4613 (0.4804/0.0333)	mem 48464MB
[2023-11-10 05:52:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][260/1251]	eta 0:39:39 lr 0.036068	time 2.3959 (2.4009)	model_time 2.3956 (2.3948)	loss 3.4888 (2.7704)	grad_norm 0.5163 (0.4809/0.0332)	mem 48464MB
[2023-11-10 05:53:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][270/1251]	eta 0:39:15 lr 0.035875	time 2.3966 (2.4007)	model_time 2.3963 (2.3949)	loss 3.2194 (2.7773)	grad_norm 0.4789 (0.4807/0.0332)	mem 48464MB
[2023-11-10 05:53:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][280/1251]	eta 0:38:50 lr 0.035683	time 2.3952 (2.4005)	model_time 2.3949 (2.3949)	loss 2.5076 (2.7740)	grad_norm 0.4329 (0.4802/0.0331)	mem 48464MB
[2023-11-10 05:53:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][290/1251]	eta 0:38:26 lr 0.035491	time 2.3954 (2.4003)	model_time 2.3951 (2.3948)	loss 1.5302 (2.7634)	grad_norm 0.5765 (0.4806/0.0335)	mem 48464MB
[2023-11-10 05:54:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][300/1251]	eta 0:38:02 lr 0.035299	time 2.3911 (2.4002)	model_time 2.3907 (2.3948)	loss 2.6090 (2.7629)	grad_norm 0.4970 (0.4806/0.0336)	mem 48464MB
[2023-11-10 05:54:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][310/1251]	eta 0:37:38 lr 0.035108	time 2.3926 (2.4000)	model_time 2.3921 (2.3948)	loss 2.6904 (2.7590)	grad_norm 0.5179 (0.4815/0.0337)	mem 48464MB
[2023-11-10 05:55:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][320/1251]	eta 0:37:14 lr 0.034917	time 2.3937 (2.3998)	model_time 2.3934 (2.3948)	loss 2.4799 (2.7605)	grad_norm 0.5216 (0.4819/0.0344)	mem 48464MB
[2023-11-10 05:55:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][330/1251]	eta 0:36:50 lr 0.034726	time 2.3926 (2.3997)	model_time 2.3923 (2.3948)	loss 1.9106 (2.7583)	grad_norm 0.4734 (0.4815/0.0343)	mem 48464MB
[2023-11-10 05:55:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][340/1251]	eta 0:36:26 lr 0.034536	time 2.3935 (2.3998)	model_time 2.3931 (2.3951)	loss 2.5836 (2.7590)	grad_norm 0.4667 (0.4811/0.0344)	mem 48464MB
[2023-11-10 05:56:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][350/1251]	eta 0:36:02 lr 0.034347	time 2.3947 (2.3997)	model_time 2.3944 (2.3951)	loss 3.4759 (2.7663)	grad_norm 0.4952 (0.4818/0.0344)	mem 48464MB
[2023-11-10 05:56:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][360/1251]	eta 0:35:38 lr 0.034158	time 2.3952 (2.3996)	model_time 2.3949 (2.3951)	loss 1.7658 (2.7593)	grad_norm 0.4753 (0.4810/0.0342)	mem 48464MB
[2023-11-10 05:57:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][370/1251]	eta 0:35:13 lr 0.033969	time 2.3969 (2.3995)	model_time 2.3966 (2.3951)	loss 2.8597 (2.7576)	grad_norm 0.5610 (0.4815/0.0350)	mem 48464MB
[2023-11-10 05:57:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][380/1251]	eta 0:34:49 lr 0.033780	time 2.3942 (2.3994)	model_time 2.3939 (2.3950)	loss 2.2310 (2.7553)	grad_norm 0.4418 (0.4805/0.0348)	mem 48464MB
[2023-11-10 05:57:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][390/1251]	eta 0:34:25 lr 0.033592	time 2.3992 (2.3992)	model_time 2.3990 (2.3950)	loss 2.5844 (2.7570)	grad_norm 0.4859 (0.4809/0.0348)	mem 48464MB
[2023-11-10 05:58:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][400/1251]	eta 0:34:01 lr 0.033405	time 2.3944 (2.3991)	model_time 2.3940 (2.3950)	loss 2.6281 (2.7533)	grad_norm 0.5337 (0.4816/0.0353)	mem 48464MB
[2023-11-10 05:58:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][410/1251]	eta 0:33:37 lr 0.033218	time 2.3941 (2.3990)	model_time 2.3937 (2.3950)	loss 2.9174 (2.7580)	grad_norm 0.4879 (0.4816/0.0352)	mem 48464MB
[2023-11-10 05:59:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][420/1251]	eta 0:33:13 lr 0.033031	time 2.3991 (2.3989)	model_time 2.3988 (2.3950)	loss 3.0045 (2.7504)	grad_norm 0.5065 (0.4819/0.0350)	mem 48464MB
[2023-11-10 05:59:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][430/1251]	eta 0:32:49 lr 0.032845	time 2.3953 (2.3988)	model_time 2.3951 (2.3950)	loss 2.9442 (2.7485)	grad_norm 0.4535 (0.4811/0.0353)	mem 48464MB
[2023-11-10 05:59:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][440/1251]	eta 0:32:25 lr 0.032659	time 2.3948 (2.3988)	model_time 2.3943 (2.3950)	loss 3.8821 (2.7531)	grad_norm 0.4688 (0.4815/0.0351)	mem 48464MB
[2023-11-10 06:00:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][450/1251]	eta 0:32:01 lr 0.032473	time 2.3947 (2.3987)	model_time 2.3943 (2.3950)	loss 2.9585 (2.7494)	grad_norm 0.5189 (0.4809/0.0349)	mem 48464MB
[2023-11-10 06:00:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][460/1251]	eta 0:31:37 lr 0.032288	time 2.3903 (2.3986)	model_time 2.3899 (2.3950)	loss 2.9683 (2.7550)	grad_norm 0.4865 (0.4808/0.0347)	mem 48464MB
[2023-11-10 06:01:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][470/1251]	eta 0:31:13 lr 0.032104	time 2.3907 (2.3985)	model_time 2.3901 (2.3949)	loss 3.3306 (2.7586)	grad_norm 0.5154 (0.4806/0.0345)	mem 48464MB
[2023-11-10 06:01:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][480/1251]	eta 0:30:49 lr 0.031920	time 2.4020 (2.3984)	model_time 2.4017 (2.3949)	loss 3.4611 (2.7554)	grad_norm 0.5406 (0.4807/0.0346)	mem 48464MB
[2023-11-10 06:01:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][490/1251]	eta 0:30:25 lr 0.031736	time 2.3965 (2.3984)	model_time 2.3961 (2.3950)	loss 2.3743 (2.7512)	grad_norm 0.4588 (0.4814/0.0341)	mem 48464MB
[2023-11-10 06:02:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][500/1251]	eta 0:30:01 lr 0.031553	time 2.3987 (2.3983)	model_time 2.3984 (2.3949)	loss 2.6831 (2.7525)	grad_norm 0.5018 (0.4818/0.0338)	mem 48464MB
[2023-11-10 06:02:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][510/1251]	eta 0:29:37 lr 0.031370	time 2.3959 (2.3985)	model_time 2.3957 (2.3952)	loss 2.9338 (2.7573)	grad_norm 0.4710 (0.4817/0.0335)	mem 48464MB
[2023-11-10 06:03:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][520/1251]	eta 0:29:13 lr 0.031187	time 2.3977 (2.3985)	model_time 2.3974 (2.3952)	loss 3.7097 (2.7575)	grad_norm 0.5376 (0.4822/0.0338)	mem 48464MB
[2023-11-10 06:03:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][530/1251]	eta 0:28:49 lr 0.031005	time 2.3943 (2.3984)	model_time 2.3941 (2.3952)	loss 2.3227 (2.7584)	grad_norm 0.5604 (0.4826/0.0344)	mem 48464MB
[2023-11-10 06:03:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][540/1251]	eta 0:28:25 lr 0.030824	time 2.3962 (2.3983)	model_time 2.3960 (2.3952)	loss 2.2595 (2.7541)	grad_norm 0.4923 (0.4832/0.0346)	mem 48464MB
[2023-11-10 06:04:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][550/1251]	eta 0:28:01 lr 0.030643	time 2.3967 (2.3983)	model_time 2.3964 (2.3952)	loss 2.7738 (2.7587)	grad_norm 0.4571 (0.4834/0.0347)	mem 48464MB
[2023-11-10 06:04:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][560/1251]	eta 0:27:37 lr 0.030462	time 2.3950 (2.3982)	model_time 2.3946 (2.3951)	loss 2.9385 (2.7540)	grad_norm 0.5123 (0.4833/0.0352)	mem 48464MB
[2023-11-10 06:05:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][570/1251]	eta 0:27:13 lr 0.030282	time 2.3951 (2.3982)	model_time 2.3947 (2.3951)	loss 2.1161 (2.7512)	grad_norm 0.5563 (0.4838/0.0356)	mem 48464MB
[2023-11-10 06:05:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][580/1251]	eta 0:26:49 lr 0.030102	time 2.3950 (2.3981)	model_time 2.3947 (2.3951)	loss 3.1359 (2.7484)	grad_norm 0.4795 (0.4836/0.0357)	mem 48464MB
[2023-11-10 06:05:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][590/1251]	eta 0:26:25 lr 0.029923	time 2.3955 (2.3980)	model_time 2.3951 (2.3951)	loss 3.0102 (2.7471)	grad_norm 0.5228 (0.4834/0.0358)	mem 48464MB
[2023-11-10 06:06:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][600/1251]	eta 0:26:01 lr 0.029744	time 2.3985 (2.3980)	model_time 2.3982 (2.3951)	loss 2.4247 (2.7463)	grad_norm 0.4769 (0.4830/0.0356)	mem 48464MB
[2023-11-10 06:06:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][610/1251]	eta 0:25:37 lr 0.029565	time 2.3949 (2.3979)	model_time 2.3947 (2.3950)	loss 3.0210 (2.7540)	grad_norm 0.4713 (0.4822/0.0356)	mem 48464MB
[2023-11-10 06:07:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][620/1251]	eta 0:25:13 lr 0.029387	time 2.3964 (2.3978)	model_time 2.3962 (2.3950)	loss 3.1417 (2.7503)	grad_norm 0.5881 (0.4823/0.0354)	mem 48464MB
[2023-11-10 06:07:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][630/1251]	eta 0:24:49 lr 0.029209	time 2.3926 (2.3978)	model_time 2.3922 (2.3950)	loss 2.4604 (2.7522)	grad_norm 0.5051 (0.4828/0.0352)	mem 48464MB
[2023-11-10 06:07:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][640/1251]	eta 0:24:25 lr 0.029032	time 2.3952 (2.3978)	model_time 2.3948 (2.3950)	loss 2.5720 (2.7543)	grad_norm 0.4445 (0.4832/0.0353)	mem 48464MB
[2023-11-10 06:08:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][650/1251]	eta 0:24:01 lr 0.028856	time 2.3928 (2.3977)	model_time 2.3925 (2.3950)	loss 3.2982 (2.7543)	grad_norm 0.4406 (0.4832/0.0352)	mem 48464MB
[2023-11-10 06:08:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][660/1251]	eta 0:23:37 lr 0.028679	time 2.3962 (2.3977)	model_time 2.3958 (2.3950)	loss 3.5825 (2.7594)	grad_norm 0.4930 (0.4841/0.0356)	mem 48464MB
[2023-11-10 06:09:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][670/1251]	eta 0:23:13 lr 0.028504	time 2.3970 (2.3977)	model_time 2.3965 (2.3950)	loss 2.9030 (2.7649)	grad_norm 0.4503 (0.4835/0.0347)	mem 48464MB
[2023-11-10 06:09:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][680/1251]	eta 0:22:49 lr 0.028328	time 2.3962 (2.3976)	model_time 2.3958 (2.3950)	loss 3.0076 (2.7650)	grad_norm 0.4864 (0.4838/0.0347)	mem 48464MB
[2023-11-10 06:09:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][690/1251]	eta 0:22:25 lr 0.028153	time 2.5198 (2.3978)	model_time 2.5195 (2.3952)	loss 3.6459 (2.7673)	grad_norm 0.4864 (0.4842/0.0343)	mem 48464MB
[2023-11-10 06:10:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][700/1251]	eta 0:22:01 lr 0.027979	time 2.3960 (2.3977)	model_time 2.3956 (2.3952)	loss 2.7802 (2.7642)	grad_norm 0.4629 (0.4832/0.0338)	mem 48464MB
[2023-11-10 06:10:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][710/1251]	eta 0:21:37 lr 0.027805	time 2.3931 (2.3977)	model_time 2.3926 (2.3952)	loss 1.8678 (2.7654)	grad_norm 0.4738 (0.4828/0.0340)	mem 48464MB
[2023-11-10 06:11:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][720/1251]	eta 0:21:13 lr 0.027631	time 2.3939 (2.3977)	model_time 2.3934 (2.3952)	loss 2.1429 (2.7667)	grad_norm 0.4615 (0.4821/0.0335)	mem 48464MB
[2023-11-10 06:11:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][730/1251]	eta 0:20:49 lr 0.027458	time 2.3966 (2.3976)	model_time 2.3962 (2.3952)	loss 3.1075 (2.7707)	grad_norm 0.4940 (0.4832/0.0340)	mem 48464MB
[2023-11-10 06:11:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][740/1251]	eta 0:20:25 lr 0.027286	time 2.3977 (2.3976)	model_time 2.3973 (2.3952)	loss 2.1462 (2.7707)	grad_norm 0.4607 (0.4835/0.0347)	mem 48464MB
[2023-11-10 06:12:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][750/1251]	eta 0:20:01 lr 0.027113	time 2.3908 (2.3976)	model_time 2.3904 (2.3951)	loss 2.1952 (2.7627)	grad_norm 0.4441 (0.4842/0.0351)	mem 48464MB
[2023-11-10 06:12:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][760/1251]	eta 0:19:37 lr 0.026942	time 2.3921 (2.3975)	model_time 2.3918 (2.3951)	loss 2.4219 (2.7589)	grad_norm 0.4458 (0.4842/0.0350)	mem 48464MB
[2023-11-10 06:13:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][770/1251]	eta 0:19:13 lr 0.026771	time 2.3963 (2.3975)	model_time 2.3960 (2.3951)	loss 3.7150 (2.7622)	grad_norm 0.4878 (0.4840/0.0349)	mem 48464MB
[2023-11-10 06:13:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][780/1251]	eta 0:18:49 lr 0.026600	time 2.3947 (2.3975)	model_time 2.3943 (2.3951)	loss 2.7944 (2.7650)	grad_norm 0.4538 (0.4834/0.0348)	mem 48464MB
[2023-11-10 06:13:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][790/1251]	eta 0:18:25 lr 0.026429	time 2.3960 (2.3974)	model_time 2.3957 (2.3951)	loss 2.6423 (2.7642)	grad_norm 0.4653 (0.4832/0.0350)	mem 48464MB
[2023-11-10 06:14:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][800/1251]	eta 0:18:01 lr 0.026260	time 2.3949 (2.3974)	model_time 2.3946 (2.3951)	loss 3.6844 (2.7649)	grad_norm 0.5185 (0.4830/0.0352)	mem 48464MB
[2023-11-10 06:14:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][810/1251]	eta 0:17:37 lr 0.026090	time 2.3917 (2.3975)	model_time 2.3912 (2.3952)	loss 1.8738 (2.7637)	grad_norm 0.4237 (0.4825/0.0355)	mem 48464MB
[2023-11-10 06:15:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][820/1251]	eta 0:17:13 lr 0.025921	time 2.3958 (2.3975)	model_time 2.3954 (2.3952)	loss 1.6538 (2.7642)	grad_norm 0.4646 (0.4821/0.0356)	mem 48464MB
[2023-11-10 06:15:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][830/1251]	eta 0:16:49 lr 0.025753	time 2.3929 (2.3975)	model_time 2.3922 (2.3952)	loss 2.3979 (2.7650)	grad_norm 0.4692 (0.4826/0.0357)	mem 48464MB
[2023-11-10 06:15:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][840/1251]	eta 0:16:25 lr 0.025585	time 2.3923 (2.3974)	model_time 2.3917 (2.3952)	loss 2.6494 (2.7625)	grad_norm 0.4675 (0.4822/0.0355)	mem 48464MB
[2023-11-10 06:16:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][850/1251]	eta 0:16:01 lr 0.025417	time 2.3953 (2.3974)	model_time 2.3950 (2.3952)	loss 2.6405 (2.7633)	grad_norm 0.4889 (0.4822/0.0354)	mem 48464MB
[2023-11-10 06:16:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][860/1251]	eta 0:15:37 lr 0.025250	time 2.3926 (2.3974)	model_time 2.3923 (2.3952)	loss 2.7433 (2.7614)	grad_norm 0.4907 (0.4820/0.0356)	mem 48464MB
[2023-11-10 06:17:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][870/1251]	eta 0:15:13 lr 0.025084	time 2.3894 (2.3973)	model_time 2.3891 (2.3952)	loss 1.6559 (2.7633)	grad_norm 0.4660 (0.4818/0.0349)	mem 48464MB
[2023-11-10 06:17:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][880/1251]	eta 0:14:49 lr 0.024918	time 2.4007 (2.3973)	model_time 2.4006 (2.3952)	loss 2.7287 (2.7623)	grad_norm 0.4633 (0.4825/0.0351)	mem 48464MB
[2023-11-10 06:17:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][890/1251]	eta 0:14:25 lr 0.024752	time 2.3952 (2.3973)	model_time 2.3948 (2.3952)	loss 2.9655 (2.7592)	grad_norm 0.4430 (0.4820/0.0351)	mem 48464MB
[2023-11-10 06:18:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][900/1251]	eta 0:14:01 lr 0.024587	time 2.3946 (2.3973)	model_time 2.3941 (2.3952)	loss 2.1013 (2.7619)	grad_norm 0.5214 (0.4830/0.0357)	mem 48464MB
[2023-11-10 06:18:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][910/1251]	eta 0:13:37 lr 0.024422	time 2.3975 (2.3974)	model_time 2.3969 (2.3954)	loss 3.1532 (2.7645)	grad_norm 0.4634 (0.4825/0.0352)	mem 48464MB
[2023-11-10 06:19:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][920/1251]	eta 0:13:13 lr 0.024258	time 2.3937 (2.3974)	model_time 2.3934 (2.3954)	loss 2.6806 (2.7671)	grad_norm 0.4711 (0.4821/0.0346)	mem 48464MB
[2023-11-10 06:19:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][930/1251]	eta 0:12:49 lr 0.024094	time 2.3954 (2.3974)	model_time 2.3950 (2.3953)	loss 2.9383 (2.7689)	grad_norm 0.4973 (0.4827/0.0350)	mem 48464MB
[2023-11-10 06:19:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][940/1251]	eta 0:12:25 lr 0.023931	time 2.3999 (2.3974)	model_time 2.3993 (2.3954)	loss 3.0807 (2.7698)	grad_norm 0.4817 (0.4827/0.0349)	mem 48464MB
[2023-11-10 06:20:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][950/1251]	eta 0:12:01 lr 0.023768	time 2.3941 (2.3974)	model_time 2.3938 (2.3954)	loss 2.7964 (2.7720)	grad_norm 0.4957 (0.4836/0.0351)	mem 48464MB
[2023-11-10 06:20:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][960/1251]	eta 0:11:37 lr 0.023606	time 2.3946 (2.3973)	model_time 2.3942 (2.3953)	loss 2.9219 (2.7726)	grad_norm 0.4944 (0.4832/0.0346)	mem 48464MB
[2023-11-10 06:21:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][970/1251]	eta 0:11:13 lr 0.023444	time 2.3951 (2.3973)	model_time 2.3947 (2.3953)	loss 2.1955 (2.7709)	grad_norm 0.4658 (0.4826/0.0348)	mem 48464MB
[2023-11-10 06:21:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][980/1251]	eta 0:10:49 lr 0.023283	time 2.3917 (2.3974)	model_time 2.3914 (2.3955)	loss 3.0887 (2.7711)	grad_norm 0.4935 (0.4829/0.0351)	mem 48464MB
[2023-11-10 06:21:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][990/1251]	eta 0:10:25 lr 0.023122	time 2.3953 (2.3974)	model_time 2.3952 (2.3955)	loss 2.8029 (2.7714)	grad_norm 0.4730 (0.4829/0.0352)	mem 48464MB
[2023-11-10 06:22:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1000/1251]	eta 0:10:01 lr 0.022961	time 2.3946 (2.3974)	model_time 2.3942 (2.3955)	loss 2.8831 (2.7695)	grad_norm 0.5190 (0.4833/0.0352)	mem 48464MB
[2023-11-10 06:22:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1010/1251]	eta 0:09:37 lr 0.022802	time 2.3955 (2.3974)	model_time 2.3951 (2.3955)	loss 3.9235 (2.7698)	grad_norm 0.5203 (0.4840/0.0354)	mem 48464MB
[2023-11-10 06:23:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1020/1251]	eta 0:09:13 lr 0.022642	time 2.3923 (2.3974)	model_time 2.3920 (2.3955)	loss 2.5283 (2.7688)	grad_norm 0.5067 (0.4851/0.0355)	mem 48464MB
[2023-11-10 06:23:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1030/1251]	eta 0:08:49 lr 0.022483	time 2.3954 (2.3973)	model_time 2.3948 (2.3955)	loss 2.6699 (2.7696)	grad_norm 0.4874 (0.4849/0.0354)	mem 48464MB
[2023-11-10 06:23:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1040/1251]	eta 0:08:25 lr 0.022325	time 2.3924 (2.3973)	model_time 2.3922 (2.3954)	loss 2.6520 (2.7699)	grad_norm 0.4631 (0.4845/0.0348)	mem 48464MB
[2023-11-10 06:24:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1050/1251]	eta 0:08:01 lr 0.022167	time 2.3949 (2.3973)	model_time 2.3946 (2.3954)	loss 2.4481 (2.7699)	grad_norm 0.4624 (0.4839/0.0342)	mem 48464MB
[2023-11-10 06:24:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1060/1251]	eta 0:07:37 lr 0.022010	time 2.3951 (2.3973)	model_time 2.3948 (2.3954)	loss 1.9974 (2.7686)	grad_norm 0.4432 (0.4836/0.0343)	mem 48464MB
[2023-11-10 06:25:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1070/1251]	eta 0:07:13 lr 0.021853	time 2.3957 (2.3972)	model_time 2.3954 (2.3954)	loss 2.6545 (2.7701)	grad_norm 0.4723 (0.4838/0.0342)	mem 48464MB
[2023-11-10 06:25:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1080/1251]	eta 0:06:49 lr 0.021696	time 2.3973 (2.3972)	model_time 2.3969 (2.3954)	loss 2.6607 (2.7692)	grad_norm 0.4590 (0.4845/0.0347)	mem 48464MB
[2023-11-10 06:25:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1090/1251]	eta 0:06:25 lr 0.021540	time 2.3966 (2.3972)	model_time 2.3963 (2.3954)	loss 2.9097 (2.7680)	grad_norm 0.5181 (0.4850/0.0349)	mem 48464MB
[2023-11-10 06:26:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1100/1251]	eta 0:06:01 lr 0.021385	time 2.3931 (2.3972)	model_time 2.3927 (2.3954)	loss 2.7232 (2.7654)	grad_norm 0.4365 (0.4847/0.0348)	mem 48464MB
[2023-11-10 06:26:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1110/1251]	eta 0:05:37 lr 0.021230	time 2.3961 (2.3972)	model_time 2.3954 (2.3954)	loss 2.5505 (2.7655)	grad_norm 0.4477 (0.4850/0.0347)	mem 48464MB
[2023-11-10 06:27:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1120/1251]	eta 0:05:14 lr 0.021075	time 2.3938 (2.3971)	model_time 2.3935 (2.3954)	loss 2.7737 (2.7655)	grad_norm 0.4839 (0.4852/0.0342)	mem 48464MB
[2023-11-10 06:27:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1130/1251]	eta 0:04:50 lr 0.020921	time 2.3899 (2.3971)	model_time 2.3896 (2.3953)	loss 2.7886 (2.7655)	grad_norm 0.4461 (0.4845/0.0338)	mem 48464MB
[2023-11-10 06:27:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1140/1251]	eta 0:04:26 lr 0.020768	time 2.3922 (2.3971)	model_time 2.3920 (2.3953)	loss 2.3484 (2.7647)	grad_norm 0.4178 (0.4849/0.0339)	mem 48464MB
[2023-11-10 06:28:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1150/1251]	eta 0:04:02 lr 0.020615	time 2.3941 (2.3970)	model_time 2.3937 (2.3953)	loss 2.6865 (2.7646)	grad_norm 0.4732 (0.4849/0.0337)	mem 48464MB
[2023-11-10 06:28:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1160/1251]	eta 0:03:38 lr 0.020463	time 2.4010 (2.3970)	model_time 2.4004 (2.3953)	loss 3.0100 (2.7670)	grad_norm 0.4227 (0.4845/0.0331)	mem 48464MB
[2023-11-10 06:29:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1170/1251]	eta 0:03:14 lr 0.020311	time 2.3963 (2.3970)	model_time 2.3956 (2.3953)	loss 2.8735 (2.7664)	grad_norm 0.5256 (0.4844/0.0334)	mem 48464MB
[2023-11-10 06:29:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1180/1251]	eta 0:02:50 lr 0.020159	time 2.3967 (2.3970)	model_time 2.3964 (2.3953)	loss 2.1618 (2.7657)	grad_norm 0.4620 (0.4848/0.0336)	mem 48464MB
[2023-11-10 06:29:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1190/1251]	eta 0:02:26 lr 0.020008	time 2.3956 (2.3970)	model_time 2.3953 (2.3953)	loss 2.1122 (2.7663)	grad_norm 0.4422 (0.4850/0.0328)	mem 48464MB
[2023-11-10 06:30:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1200/1251]	eta 0:02:02 lr 0.019858	time 2.3944 (2.3970)	model_time 2.3940 (2.3953)	loss 1.5771 (2.7647)	grad_norm 0.5488 (0.4848/0.0328)	mem 48464MB
[2023-11-10 06:30:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1210/1251]	eta 0:01:38 lr 0.019708	time 2.3921 (2.3969)	model_time 2.3919 (2.3953)	loss 3.0971 (2.7652)	grad_norm 0.5110 (0.4850/0.0331)	mem 48464MB
[2023-11-10 06:31:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1220/1251]	eta 0:01:14 lr 0.019558	time 2.3953 (2.3969)	model_time 2.3949 (2.3953)	loss 1.5196 (2.7643)	grad_norm 0.4486 (0.4846/0.0333)	mem 48464MB
[2023-11-10 06:31:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1230/1251]	eta 0:00:50 lr 0.019409	time 2.3908 (2.3969)	model_time 2.3904 (2.3952)	loss 3.0370 (2.7641)	grad_norm 0.5869 (0.4839/0.0336)	mem 48464MB
[2023-11-10 06:31:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1240/1251]	eta 0:00:26 lr 0.019261	time 2.3927 (2.3969)	model_time 2.3926 (2.3952)	loss 2.7191 (2.7632)	grad_norm 0.4322 (0.4834/0.0334)	mem 48464MB
[2023-11-10 06:32:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [7/10][1250/1251]	eta 0:00:02 lr 0.019113	time 2.3915 (2.3969)	model_time 2.3914 (2.3952)	loss 2.6419 (2.7638)	grad_norm 0.4796 (0.4825/0.0331)	mem 48464MB
[2023-11-10 06:32:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 7 training takes 0:49:58
[2023-11-10 06:32:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_7.pth saving......
[2023-11-10 06:34:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_7.pth saved !!!
[2023-11-10 06:34:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.742 (3.742)	Loss 0.6050 (0.6050)	Acc@1 87.695 (87.695)	Acc@5 98.828 (98.828)	Mem 48464MB
[2023-11-10 06:34:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.245 (2.374)	Loss 0.6973 (0.6023)	Acc@1 86.426 (88.281)	Acc@5 97.852 (98.438)	Mem 48464MB
[2023-11-10 06:34:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.252 (2.315)	Loss 0.5356 (0.6038)	Acc@1 89.746 (88.188)	Acc@5 98.633 (98.433)	Mem 48464MB
[2023-11-10 06:35:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.295)	Loss 0.6636 (0.6081)	Acc@1 86.523 (88.039)	Acc@5 98.047 (98.469)	Mem 48464MB
[2023-11-10 06:35:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.249 (2.284)	Loss 0.6699 (0.6104)	Acc@1 86.719 (88.045)	Acc@5 97.754 (98.449)	Mem 48464MB
[2023-11-10 06:35:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:7] * Acc@1 88.082 Acc@5 98.466
[2023-11-10 06:35:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 88.1%
[2023-11-10 06:35:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-10 06:37:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-10 06:37:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 88.08%
[2023-11-10 06:37:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.773 (3.773)	Loss 0.5981 (0.5981)	Acc@1 87.793 (87.793)	Acc@5 98.730 (98.730)	Mem 48464MB
[2023-11-10 06:38:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.248 (2.377)	Loss 0.6938 (0.5979)	Acc@1 86.230 (88.326)	Acc@5 97.852 (98.402)	Mem 48464MB
[2023-11-10 06:38:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.252 (2.317)	Loss 0.5308 (0.5987)	Acc@1 90.332 (88.328)	Acc@5 98.730 (98.419)	Mem 48464MB
[2023-11-10 06:38:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.296)	Loss 0.6567 (0.6037)	Acc@1 86.621 (88.146)	Acc@5 98.145 (98.463)	Mem 48464MB
[2023-11-10 06:39:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.252 (2.286)	Loss 0.6616 (0.6060)	Acc@1 87.012 (88.153)	Acc@5 97.852 (98.454)	Mem 48464MB
[2023-11-10 06:39:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:7] * Acc@1 88.192 Acc@5 98.462
[2023-11-10 06:39:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 88.2%
[2023-11-10 06:39:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saving......
[2023-11-10 06:41:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saved !!!
[2023-11-10 06:41:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 88.19%
[2023-11-10 06:41:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][0/1251]	eta 1:19:59 lr 0.019098	time 3.8363 (3.8363)	model_time 2.3951 (2.3951)	loss 2.9805 (2.9805)	grad_norm 0.4877 (0.4877/0.0000)	mem 48464MB
[2023-11-10 06:41:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][10/1251]	eta 0:51:56 lr 0.018951	time 2.3865 (2.5110)	model_time 2.3862 (2.3797)	loss 2.3642 (2.9686)	grad_norm 0.5022 (0.4960/0.0349)	mem 48464MB
[2023-11-10 06:42:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][20/1251]	eta 0:50:20 lr 0.018804	time 2.3922 (2.4540)	model_time 2.3920 (2.3849)	loss 3.3615 (2.9765)	grad_norm 0.5435 (0.4994/0.0367)	mem 48464MB
[2023-11-10 06:42:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][30/1251]	eta 0:49:32 lr 0.018658	time 2.3938 (2.4346)	model_time 2.3935 (2.3877)	loss 3.3926 (2.9988)	grad_norm 0.4662 (0.4915/0.0354)	mem 48464MB
[2023-11-10 06:42:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][40/1251]	eta 0:48:56 lr 0.018512	time 2.3922 (2.4247)	model_time 2.3918 (2.3891)	loss 2.7864 (2.8896)	grad_norm 0.4556 (0.4870/0.0345)	mem 48464MB
[2023-11-10 06:43:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][50/1251]	eta 0:48:25 lr 0.018367	time 2.3991 (2.4192)	model_time 2.3986 (2.3905)	loss 3.6975 (2.9239)	grad_norm 0.5132 (0.4868/0.0345)	mem 48464MB
[2023-11-10 06:43:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][60/1251]	eta 0:47:56 lr 0.018222	time 2.3973 (2.4155)	model_time 2.3969 (2.3914)	loss 2.7151 (2.9016)	grad_norm 0.4840 (0.4864/0.0349)	mem 48464MB
[2023-11-10 06:44:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][70/1251]	eta 0:47:29 lr 0.018078	time 2.4003 (2.4129)	model_time 2.3999 (2.3921)	loss 2.9970 (2.9132)	grad_norm 0.4846 (0.4866/0.0347)	mem 48464MB
[2023-11-10 06:44:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][80/1251]	eta 0:47:04 lr 0.017934	time 2.3908 (2.4120)	model_time 2.3906 (2.3938)	loss 1.7410 (2.8843)	grad_norm 0.4321 (0.4853/0.0343)	mem 48464MB
[2023-11-10 06:44:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][90/1251]	eta 0:46:38 lr 0.017791	time 2.3965 (2.4103)	model_time 2.3961 (2.3940)	loss 2.8040 (2.8952)	grad_norm 0.5106 (0.4856/0.0340)	mem 48464MB
[2023-11-10 06:45:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][100/1251]	eta 0:46:12 lr 0.017648	time 2.3936 (2.4089)	model_time 2.3933 (2.3941)	loss 2.2106 (2.8653)	grad_norm 0.4604 (0.4850/0.0341)	mem 48464MB
[2023-11-10 06:45:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][110/1251]	eta 0:45:47 lr 0.017506	time 2.3922 (2.4076)	model_time 2.3917 (2.3941)	loss 2.9695 (2.8766)	grad_norm 0.4979 (0.4838/0.0338)	mem 48464MB
[2023-11-10 06:46:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][120/1251]	eta 0:45:21 lr 0.017364	time 2.3977 (2.4066)	model_time 2.3974 (2.3942)	loss 2.5051 (2.8711)	grad_norm 0.4639 (0.4839/0.0335)	mem 48464MB
[2023-11-10 06:46:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][130/1251]	eta 0:44:56 lr 0.017223	time 2.3976 (2.4058)	model_time 2.3974 (2.3943)	loss 2.9446 (2.8627)	grad_norm 0.4952 (0.4837/0.0336)	mem 48464MB
[2023-11-10 06:46:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][140/1251]	eta 0:44:31 lr 0.017082	time 2.3995 (2.4050)	model_time 2.3988 (2.3942)	loss 1.8652 (2.8422)	grad_norm 0.4868 (0.4829/0.0334)	mem 48464MB
[2023-11-10 06:47:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][150/1251]	eta 0:44:07 lr 0.016942	time 2.3925 (2.4045)	model_time 2.3922 (2.3943)	loss 2.2914 (2.8303)	grad_norm 0.4489 (0.4836/0.0339)	mem 48464MB
[2023-11-10 06:47:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][160/1251]	eta 0:43:42 lr 0.016802	time 2.3979 (2.4039)	model_time 2.3976 (2.3943)	loss 3.5346 (2.8301)	grad_norm 0.4871 (0.4823/0.0341)	mem 48464MB
[2023-11-10 06:48:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][170/1251]	eta 0:43:18 lr 0.016663	time 2.3938 (2.4034)	model_time 2.3934 (2.3944)	loss 2.9482 (2.8406)	grad_norm 0.5647 (0.4832/0.0352)	mem 48464MB
[2023-11-10 06:48:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][180/1251]	eta 0:42:53 lr 0.016525	time 2.3946 (2.4027)	model_time 2.3944 (2.3942)	loss 2.9479 (2.8248)	grad_norm 0.4751 (0.4835/0.0346)	mem 48464MB
[2023-11-10 06:48:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][190/1251]	eta 0:42:29 lr 0.016387	time 2.3900 (2.4029)	model_time 2.3897 (2.3948)	loss 3.5466 (2.8262)	grad_norm 0.4735 (0.4837/0.0345)	mem 48464MB
[2023-11-10 06:49:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][200/1251]	eta 0:42:04 lr 0.016249	time 2.3923 (2.4024)	model_time 2.3921 (2.3947)	loss 1.9425 (2.8293)	grad_norm 0.4967 (0.4837/0.0343)	mem 48464MB
[2023-11-10 06:49:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][210/1251]	eta 0:41:40 lr 0.016112	time 2.3941 (2.4023)	model_time 2.3938 (2.3948)	loss 1.5678 (2.8229)	grad_norm 0.4927 (0.4842/0.0340)	mem 48464MB
[2023-11-10 06:50:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][220/1251]	eta 0:41:16 lr 0.015976	time 2.3994 (2.4020)	model_time 2.3989 (2.3948)	loss 2.2368 (2.8075)	grad_norm 0.4385 (0.4839/0.0342)	mem 48464MB
[2023-11-10 06:50:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][230/1251]	eta 0:40:52 lr 0.015840	time 2.3953 (2.4017)	model_time 2.3950 (2.3948)	loss 1.6335 (2.8038)	grad_norm 0.4693 (0.4842/0.0343)	mem 48464MB
[2023-11-10 06:50:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][240/1251]	eta 0:40:27 lr 0.015705	time 2.3994 (2.4014)	model_time 2.3991 (2.3948)	loss 3.7456 (2.8144)	grad_norm 0.5036 (0.4837/0.0342)	mem 48464MB
[2023-11-10 06:51:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][250/1251]	eta 0:40:03 lr 0.015570	time 2.3953 (2.4012)	model_time 2.3947 (2.3948)	loss 3.2786 (2.8237)	grad_norm 0.4797 (0.4842/0.0340)	mem 48464MB
[2023-11-10 06:51:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][260/1251]	eta 0:39:39 lr 0.015436	time 2.3880 (2.4009)	model_time 2.3877 (2.3947)	loss 2.9072 (2.8260)	grad_norm 0.5018 (0.4838/0.0339)	mem 48464MB
[2023-11-10 06:52:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][270/1251]	eta 0:39:14 lr 0.015302	time 2.3982 (2.4006)	model_time 2.3977 (2.3947)	loss 3.6146 (2.8181)	grad_norm 0.4712 (0.4838/0.0334)	mem 48464MB
[2023-11-10 06:52:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][280/1251]	eta 0:38:50 lr 0.015169	time 2.3979 (2.4004)	model_time 2.3975 (2.3947)	loss 3.4762 (2.8184)	grad_norm 0.5349 (0.4842/0.0335)	mem 48464MB
[2023-11-10 06:52:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][290/1251]	eta 0:38:26 lr 0.015036	time 2.3967 (2.4003)	model_time 2.3963 (2.3947)	loss 3.5343 (2.8170)	grad_norm 0.5181 (0.4840/0.0334)	mem 48464MB
[2023-11-10 06:53:17 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][300/1251]	eta 0:38:02 lr 0.014904	time 2.3941 (2.4001)	model_time 2.3937 (2.3947)	loss 1.9932 (2.8228)	grad_norm 0.4315 (0.4838/0.0335)	mem 48464MB
[2023-11-10 06:53:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][310/1251]	eta 0:37:38 lr 0.014772	time 2.3967 (2.3999)	model_time 2.3964 (2.3947)	loss 2.7968 (2.8126)	grad_norm 0.5641 (0.4829/0.0339)	mem 48464MB
[2023-11-10 06:54:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][320/1251]	eta 0:37:14 lr 0.014641	time 2.3941 (2.3997)	model_time 2.3936 (2.3946)	loss 2.9900 (2.8082)	grad_norm 0.4995 (0.4823/0.0333)	mem 48464MB
[2023-11-10 06:54:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][330/1251]	eta 0:36:50 lr 0.014510	time 2.3986 (2.3996)	model_time 2.3983 (2.3946)	loss 3.3681 (2.8144)	grad_norm 0.4490 (0.4821/0.0334)	mem 48464MB
[2023-11-10 06:54:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][340/1251]	eta 0:36:25 lr 0.014380	time 2.3937 (2.3995)	model_time 2.3933 (2.3947)	loss 2.2644 (2.8125)	grad_norm 0.4694 (0.4825/0.0335)	mem 48464MB
[2023-11-10 06:55:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][350/1251]	eta 0:36:01 lr 0.014251	time 2.3987 (2.3994)	model_time 2.3980 (2.3947)	loss 3.0232 (2.8131)	grad_norm 0.4675 (0.4829/0.0337)	mem 48464MB
[2023-11-10 06:55:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][360/1251]	eta 0:35:37 lr 0.014122	time 2.3931 (2.3992)	model_time 2.3928 (2.3947)	loss 2.8680 (2.8168)	grad_norm 0.4628 (0.4828/0.0337)	mem 48464MB
[2023-11-10 06:56:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][370/1251]	eta 0:35:13 lr 0.013994	time 2.3944 (2.3991)	model_time 2.3941 (2.3947)	loss 2.0959 (2.8068)	grad_norm 0.4579 (0.4826/0.0334)	mem 48464MB
[2023-11-10 06:56:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][380/1251]	eta 0:34:49 lr 0.013866	time 2.3940 (2.3990)	model_time 2.3933 (2.3946)	loss 2.8015 (2.8024)	grad_norm 0.5317 (0.4829/0.0338)	mem 48464MB
[2023-11-10 06:56:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][390/1251]	eta 0:34:25 lr 0.013738	time 2.3957 (2.3989)	model_time 2.3954 (2.3947)	loss 2.4991 (2.8013)	grad_norm 0.4971 (0.4829/0.0340)	mem 48464MB
[2023-11-10 06:57:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][400/1251]	eta 0:34:01 lr 0.013612	time 2.3945 (2.3989)	model_time 2.3941 (2.3947)	loss 2.9765 (2.7951)	grad_norm 0.4956 (0.4838/0.0342)	mem 48464MB
[2023-11-10 06:57:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][410/1251]	eta 0:33:37 lr 0.013485	time 2.3915 (2.3987)	model_time 2.3911 (2.3946)	loss 2.4282 (2.7949)	grad_norm 0.5258 (0.4850/0.0345)	mem 48464MB
[2023-11-10 06:58:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][420/1251]	eta 0:33:13 lr 0.013360	time 2.3961 (2.3987)	model_time 2.3957 (2.3947)	loss 2.2306 (2.7903)	grad_norm 0.4503 (0.4846/0.0347)	mem 48464MB
[2023-11-10 06:58:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][430/1251]	eta 0:32:49 lr 0.013235	time 2.3951 (2.3986)	model_time 2.3949 (2.3947)	loss 1.7448 (2.7863)	grad_norm 0.4582 (0.4848/0.0347)	mem 48464MB
[2023-11-10 06:58:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][440/1251]	eta 0:32:25 lr 0.013110	time 2.3967 (2.3986)	model_time 2.3963 (2.3947)	loss 2.5165 (2.7894)	grad_norm 0.4727 (0.4849/0.0344)	mem 48464MB
[2023-11-10 06:59:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][450/1251]	eta 0:32:01 lr 0.012986	time 2.3938 (2.3985)	model_time 2.3935 (2.3947)	loss 2.6439 (2.7876)	grad_norm 0.4969 (0.4839/0.0341)	mem 48464MB
[2023-11-10 06:59:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][460/1251]	eta 0:31:37 lr 0.012863	time 2.3919 (2.3984)	model_time 2.3916 (2.3947)	loss 3.1286 (2.7853)	grad_norm 0.4749 (0.4847/0.0336)	mem 48464MB
[2023-11-10 07:00:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][470/1251]	eta 0:31:13 lr 0.012740	time 2.3963 (2.3984)	model_time 2.3959 (2.3947)	loss 2.5810 (2.7899)	grad_norm 0.5009 (0.4855/0.0347)	mem 48464MB
[2023-11-10 07:00:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][480/1251]	eta 0:30:49 lr 0.012617	time 2.3925 (2.3983)	model_time 2.3919 (2.3946)	loss 2.2595 (2.7848)	grad_norm 0.4768 (0.4853/0.0353)	mem 48464MB
[2023-11-10 07:00:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][490/1251]	eta 0:30:25 lr 0.012495	time 2.3931 (2.3982)	model_time 2.3925 (2.3946)	loss 3.7294 (2.7860)	grad_norm 0.4727 (0.4850/0.0352)	mem 48464MB
[2023-11-10 07:01:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][500/1251]	eta 0:30:01 lr 0.012374	time 2.3954 (2.3985)	model_time 2.3951 (2.3949)	loss 2.9138 (2.7903)	grad_norm 0.4957 (0.4855/0.0351)	mem 48464MB
[2023-11-10 07:01:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][510/1251]	eta 0:29:37 lr 0.012253	time 2.3940 (2.3984)	model_time 2.3937 (2.3949)	loss 2.9412 (2.7900)	grad_norm 0.4615 (0.4849/0.0355)	mem 48464MB
[2023-11-10 07:02:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][520/1251]	eta 0:29:13 lr 0.012133	time 2.3931 (2.3983)	model_time 2.3926 (2.3949)	loss 3.7143 (2.7903)	grad_norm 0.4823 (0.4848/0.0351)	mem 48464MB
[2023-11-10 07:02:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][530/1251]	eta 0:28:49 lr 0.012014	time 2.3949 (2.3982)	model_time 2.3944 (2.3949)	loss 1.6245 (2.7835)	grad_norm 0.4708 (0.4846/0.0349)	mem 48464MB
[2023-11-10 07:02:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][540/1251]	eta 0:28:25 lr 0.011895	time 2.3921 (2.3981)	model_time 2.3919 (2.3949)	loss 3.5605 (2.7855)	grad_norm 0.4715 (0.4850/0.0353)	mem 48464MB
[2023-11-10 07:03:16 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][550/1251]	eta 0:28:01 lr 0.011776	time 2.3933 (2.3981)	model_time 2.3929 (2.3948)	loss 1.6513 (2.7821)	grad_norm 0.4190 (0.4844/0.0357)	mem 48464MB
[2023-11-10 07:03:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][560/1251]	eta 0:27:37 lr 0.011658	time 2.3955 (2.3980)	model_time 2.3949 (2.3948)	loss 3.2282 (2.7740)	grad_norm 0.5395 (0.4849/0.0360)	mem 48464MB
[2023-11-10 07:04:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][570/1251]	eta 0:27:13 lr 0.011541	time 2.3980 (2.3979)	model_time 2.3976 (2.3948)	loss 3.6500 (2.7740)	grad_norm 0.4561 (0.4849/0.0362)	mem 48464MB
[2023-11-10 07:04:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][580/1251]	eta 0:26:49 lr 0.011424	time 2.3927 (2.3979)	model_time 2.3923 (2.3948)	loss 1.7918 (2.7744)	grad_norm 0.4371 (0.4845/0.0361)	mem 48464MB
[2023-11-10 07:04:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][590/1251]	eta 0:26:24 lr 0.011308	time 2.3923 (2.3978)	model_time 2.3921 (2.3948)	loss 2.6558 (2.7697)	grad_norm 0.5223 (0.4850/0.0366)	mem 48464MB
[2023-11-10 07:05:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][600/1251]	eta 0:26:00 lr 0.011192	time 2.3969 (2.3978)	model_time 2.3965 (2.3948)	loss 2.9689 (2.7692)	grad_norm 0.4900 (0.4849/0.0365)	mem 48464MB
[2023-11-10 07:05:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][610/1251]	eta 0:25:37 lr 0.011077	time 2.5051 (2.3979)	model_time 2.5048 (2.3950)	loss 2.7798 (2.7683)	grad_norm 0.4317 (0.4858/0.0359)	mem 48464MB
[2023-11-10 07:06:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][620/1251]	eta 0:25:13 lr 0.010962	time 2.3925 (2.3979)	model_time 2.3923 (2.3949)	loss 3.3442 (2.7663)	grad_norm 0.4476 (0.4851/0.0362)	mem 48464MB
[2023-11-10 07:06:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][630/1251]	eta 0:24:49 lr 0.010848	time 2.3946 (2.3978)	model_time 2.3943 (2.3949)	loss 2.2959 (2.7738)	grad_norm 0.5358 (0.4860/0.0364)	mem 48464MB
[2023-11-10 07:06:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][640/1251]	eta 0:24:25 lr 0.010735	time 2.3981 (2.3978)	model_time 2.3978 (2.3949)	loss 3.4134 (2.7715)	grad_norm 0.4514 (0.4863/0.0364)	mem 48464MB
[2023-11-10 07:07:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][650/1251]	eta 0:24:01 lr 0.010622	time 2.3938 (2.3977)	model_time 2.3935 (2.3949)	loss 2.4355 (2.7688)	grad_norm 0.5198 (0.4858/0.0361)	mem 48464MB
[2023-11-10 07:07:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][660/1251]	eta 0:23:37 lr 0.010509	time 2.3969 (2.3977)	model_time 2.3955 (2.3949)	loss 3.0288 (2.7698)	grad_norm 0.4711 (0.4856/0.0358)	mem 48464MB
[2023-11-10 07:08:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][670/1251]	eta 0:23:13 lr 0.010398	time 2.3955 (2.3978)	model_time 2.3951 (2.3951)	loss 2.6653 (2.7686)	grad_norm 0.4858 (0.4855/0.0359)	mem 48464MB
[2023-11-10 07:08:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][680/1251]	eta 0:22:49 lr 0.010286	time 2.3900 (2.3977)	model_time 2.3898 (2.3950)	loss 3.0866 (2.7729)	grad_norm 0.4419 (0.4855/0.0358)	mem 48464MB
[2023-11-10 07:08:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][690/1251]	eta 0:22:25 lr 0.010176	time 2.3924 (2.3977)	model_time 2.3921 (2.3950)	loss 3.0532 (2.7728)	grad_norm 0.4590 (0.4853/0.0356)	mem 48464MB
[2023-11-10 07:09:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][700/1251]	eta 0:22:01 lr 0.010066	time 2.3927 (2.3976)	model_time 2.3925 (2.3950)	loss 2.2979 (2.7738)	grad_norm 0.4296 (0.4846/0.0355)	mem 48464MB
[2023-11-10 07:09:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][710/1251]	eta 0:21:37 lr 0.009956	time 2.3949 (2.3976)	model_time 2.3942 (2.3950)	loss 1.4805 (2.7717)	grad_norm 0.4389 (0.4835/0.0352)	mem 48464MB
[2023-11-10 07:10:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][720/1251]	eta 0:21:13 lr 0.009847	time 2.3928 (2.3976)	model_time 2.3923 (2.3950)	loss 3.0025 (2.7710)	grad_norm 0.4964 (0.4835/0.0349)	mem 48464MB
[2023-11-10 07:10:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][730/1251]	eta 0:20:49 lr 0.009739	time 2.3929 (2.3975)	model_time 2.3918 (2.3950)	loss 3.6286 (2.7736)	grad_norm 0.4980 (0.4833/0.0347)	mem 48464MB
[2023-11-10 07:10:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][740/1251]	eta 0:20:25 lr 0.009631	time 2.3930 (2.3975)	model_time 2.3924 (2.3950)	loss 2.2878 (2.7739)	grad_norm 0.4659 (0.4838/0.0350)	mem 48464MB
[2023-11-10 07:11:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][750/1251]	eta 0:20:01 lr 0.009524	time 2.3984 (2.3975)	model_time 2.3980 (2.3950)	loss 2.1951 (2.7748)	grad_norm 0.4764 (0.4846/0.0350)	mem 48464MB
[2023-11-10 07:11:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][760/1251]	eta 0:19:37 lr 0.009417	time 2.3929 (2.3975)	model_time 2.3926 (2.3950)	loss 2.7375 (2.7742)	grad_norm 0.4939 (0.4839/0.0352)	mem 48464MB
[2023-11-10 07:12:03 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][770/1251]	eta 0:19:13 lr 0.009311	time 2.3957 (2.3975)	model_time 2.3954 (2.3950)	loss 2.6244 (2.7722)	grad_norm 0.4825 (0.4831/0.0342)	mem 48464MB
[2023-11-10 07:12:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][780/1251]	eta 0:18:49 lr 0.009205	time 2.3933 (2.3974)	model_time 2.3929 (2.3950)	loss 2.9845 (2.7747)	grad_norm 0.5068 (0.4828/0.0335)	mem 48464MB
[2023-11-10 07:12:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][790/1251]	eta 0:18:25 lr 0.009100	time 2.3943 (2.3974)	model_time 2.3941 (2.3950)	loss 1.8322 (2.7702)	grad_norm 0.4527 (0.4830/0.0335)	mem 48464MB
[2023-11-10 07:13:15 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][800/1251]	eta 0:18:01 lr 0.008996	time 2.3962 (2.3973)	model_time 2.3958 (2.3949)	loss 2.7439 (2.7690)	grad_norm 0.5478 (0.4830/0.0337)	mem 48464MB
[2023-11-10 07:13:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][810/1251]	eta 0:17:37 lr 0.008892	time 2.3923 (2.3973)	model_time 2.3918 (2.3949)	loss 2.9575 (2.7696)	grad_norm 0.4972 (0.4835/0.0332)	mem 48464MB
[2023-11-10 07:14:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][820/1251]	eta 0:17:13 lr 0.008789	time 2.3974 (2.3973)	model_time 2.3972 (2.3949)	loss 2.1895 (2.7669)	grad_norm 0.4594 (0.4832/0.0339)	mem 48464MB
[2023-11-10 07:14:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][830/1251]	eta 0:16:49 lr 0.008686	time 2.3904 (2.3974)	model_time 2.3900 (2.3950)	loss 3.1051 (2.7684)	grad_norm 0.4878 (0.4836/0.0343)	mem 48464MB
[2023-11-10 07:14:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][840/1251]	eta 0:16:25 lr 0.008584	time 2.3946 (2.3973)	model_time 2.3941 (2.3950)	loss 3.4737 (2.7699)	grad_norm 0.4830 (0.4843/0.0340)	mem 48464MB
[2023-11-10 07:15:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][850/1251]	eta 0:16:01 lr 0.008483	time 2.3962 (2.3973)	model_time 2.3959 (2.3950)	loss 2.7774 (2.7703)	grad_norm 0.4386 (0.4849/0.0340)	mem 48464MB
[2023-11-10 07:15:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][860/1251]	eta 0:15:37 lr 0.008382	time 2.3964 (2.3972)	model_time 2.3961 (2.3950)	loss 2.4614 (2.7720)	grad_norm 0.4629 (0.4850/0.0337)	mem 48464MB
[2023-11-10 07:16:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][870/1251]	eta 0:15:13 lr 0.008281	time 2.3930 (2.3972)	model_time 2.3925 (2.3950)	loss 3.9230 (2.7725)	grad_norm 0.4979 (0.4848/0.0342)	mem 48464MB
[2023-11-10 07:16:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][880/1251]	eta 0:14:49 lr 0.008182	time 2.3925 (2.3972)	model_time 2.3923 (2.3950)	loss 3.0014 (2.7731)	grad_norm 0.4847 (0.4848/0.0342)	mem 48464MB
[2023-11-10 07:16:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][890/1251]	eta 0:14:25 lr 0.008083	time 2.3921 (2.3971)	model_time 2.3919 (2.3950)	loss 2.5363 (2.7712)	grad_norm 0.4404 (0.4847/0.0338)	mem 48464MB
[2023-11-10 07:17:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][900/1251]	eta 0:14:01 lr 0.007984	time 2.3944 (2.3971)	model_time 2.3941 (2.3949)	loss 2.6712 (2.7717)	grad_norm 0.4493 (0.4848/0.0339)	mem 48464MB
[2023-11-10 07:17:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][910/1251]	eta 0:13:37 lr 0.007886	time 2.3934 (2.3971)	model_time 2.3930 (2.3949)	loss 2.9077 (2.7690)	grad_norm 0.4693 (0.4855/0.0342)	mem 48464MB
[2023-11-10 07:18:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][920/1251]	eta 0:13:13 lr 0.007788	time 2.3912 (2.3970)	model_time 2.3910 (2.3949)	loss 1.8947 (2.7675)	grad_norm 0.4664 (0.4853/0.0345)	mem 48464MB
[2023-11-10 07:18:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][930/1251]	eta 0:12:49 lr 0.007692	time 2.3956 (2.3970)	model_time 2.3953 (2.3949)	loss 2.3758 (2.7662)	grad_norm 0.4384 (0.4847/0.0345)	mem 48464MB
[2023-11-10 07:18:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][940/1251]	eta 0:12:25 lr 0.007595	time 2.3925 (2.3970)	model_time 2.3922 (2.3949)	loss 2.7630 (2.7682)	grad_norm 0.4875 (0.4843/0.0342)	mem 48464MB
[2023-11-10 07:19:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][950/1251]	eta 0:12:01 lr 0.007500	time 2.3948 (2.3969)	model_time 2.3944 (2.3949)	loss 1.4868 (2.7667)	grad_norm 0.4855 (0.4835/0.0346)	mem 48464MB
[2023-11-10 07:19:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][960/1251]	eta 0:11:37 lr 0.007404	time 2.3957 (2.3969)	model_time 2.3953 (2.3949)	loss 3.3969 (2.7704)	grad_norm 0.4516 (0.4836/0.0348)	mem 48464MB
[2023-11-10 07:20:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][970/1251]	eta 0:11:13 lr 0.007310	time 2.3974 (2.3969)	model_time 2.3968 (2.3949)	loss 3.6374 (2.7701)	grad_norm 0.4547 (0.4834/0.0349)	mem 48464MB
[2023-11-10 07:20:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][980/1251]	eta 0:10:49 lr 0.007216	time 2.3948 (2.3970)	model_time 2.3942 (2.3950)	loss 2.7641 (2.7697)	grad_norm 0.4665 (0.4840/0.0350)	mem 48464MB
[2023-11-10 07:20:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][990/1251]	eta 0:10:25 lr 0.007123	time 2.3985 (2.3970)	model_time 2.3977 (2.3950)	loss 1.9068 (2.7694)	grad_norm 0.4548 (0.4843/0.0350)	mem 48464MB
[2023-11-10 07:21:14 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1000/1251]	eta 0:10:01 lr 0.007030	time 2.3984 (2.3970)	model_time 2.3982 (2.3950)	loss 2.8816 (2.7716)	grad_norm 0.4783 (0.4850/0.0352)	mem 48464MB
[2023-11-10 07:21:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1010/1251]	eta 0:09:37 lr 0.006938	time 2.3958 (2.3970)	model_time 2.3955 (2.3950)	loss 2.6435 (2.7738)	grad_norm 0.5087 (0.4860/0.0355)	mem 48464MB
[2023-11-10 07:22:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1020/1251]	eta 0:09:13 lr 0.006846	time 2.3916 (2.3970)	model_time 2.3914 (2.3950)	loss 2.5966 (2.7745)	grad_norm 0.5263 (0.4859/0.0358)	mem 48464MB
[2023-11-10 07:22:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1030/1251]	eta 0:08:49 lr 0.006755	time 2.3987 (2.3970)	model_time 2.3983 (2.3950)	loss 3.0552 (2.7743)	grad_norm 0.4899 (0.4861/0.0357)	mem 48464MB
[2023-11-10 07:22:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1040/1251]	eta 0:08:25 lr 0.006664	time 2.3955 (2.3970)	model_time 2.3952 (2.3950)	loss 2.0429 (2.7766)	grad_norm 0.4370 (0.4858/0.0357)	mem 48464MB
[2023-11-10 07:23:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1050/1251]	eta 0:08:01 lr 0.006575	time 2.3940 (2.3969)	model_time 2.3937 (2.3950)	loss 3.1168 (2.7738)	grad_norm 0.4453 (0.4853/0.0360)	mem 48464MB
[2023-11-10 07:23:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1060/1251]	eta 0:07:37 lr 0.006485	time 2.3923 (2.3969)	model_time 2.3921 (2.3950)	loss 2.9010 (2.7754)	grad_norm 0.5316 (0.4859/0.0358)	mem 48464MB
[2023-11-10 07:24:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1070/1251]	eta 0:07:13 lr 0.006397	time 2.3971 (2.3969)	model_time 2.3967 (2.3950)	loss 2.8879 (2.7749)	grad_norm 0.4807 (0.4857/0.0357)	mem 48464MB
[2023-11-10 07:24:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1080/1251]	eta 0:06:49 lr 0.006309	time 2.3940 (2.3969)	model_time 2.3936 (2.3950)	loss 2.2453 (2.7737)	grad_norm 0.4248 (0.4852/0.0357)	mem 48464MB
[2023-11-10 07:24:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1090/1251]	eta 0:06:25 lr 0.006221	time 2.3953 (2.3969)	model_time 2.3951 (2.3950)	loss 3.6491 (2.7755)	grad_norm 0.5537 (0.4850/0.0358)	mem 48464MB
[2023-11-10 07:25:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1100/1251]	eta 0:06:01 lr 0.006134	time 2.4004 (2.3968)	model_time 2.4002 (2.3950)	loss 3.0677 (2.7725)	grad_norm 0.4874 (0.4844/0.0355)	mem 48464MB
[2023-11-10 07:25:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1110/1251]	eta 0:05:37 lr 0.006048	time 2.3935 (2.3968)	model_time 2.3929 (2.3950)	loss 2.8531 (2.7734)	grad_norm 0.5548 (0.4846/0.0363)	mem 48464MB
[2023-11-10 07:26:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1120/1251]	eta 0:05:13 lr 0.005962	time 2.3929 (2.3968)	model_time 2.3925 (2.3950)	loss 2.9159 (2.7737)	grad_norm 0.4580 (0.4851/0.0360)	mem 48464MB
[2023-11-10 07:26:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1130/1251]	eta 0:04:50 lr 0.005877	time 2.3978 (2.3968)	model_time 2.3975 (2.3950)	loss 3.0230 (2.7735)	grad_norm 0.4933 (0.4848/0.0354)	mem 48464MB
[2023-11-10 07:26:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1140/1251]	eta 0:04:26 lr 0.005793	time 2.5183 (2.3969)	model_time 2.5180 (2.3951)	loss 2.6817 (2.7754)	grad_norm 0.5001 (0.4842/0.0352)	mem 48464MB
[2023-11-10 07:27:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1150/1251]	eta 0:04:02 lr 0.005709	time 2.3927 (2.3969)	model_time 2.3923 (2.3951)	loss 2.6792 (2.7754)	grad_norm 0.4544 (0.4833/0.0347)	mem 48464MB
[2023-11-10 07:27:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1160/1251]	eta 0:03:38 lr 0.005625	time 2.3957 (2.3969)	model_time 2.3951 (2.3951)	loss 2.0448 (2.7752)	grad_norm 0.5335 (0.4836/0.0349)	mem 48464MB
[2023-11-10 07:28:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1170/1251]	eta 0:03:14 lr 0.005543	time 2.3966 (2.3969)	model_time 2.3959 (2.3951)	loss 2.3721 (2.7727)	grad_norm 0.4745 (0.4833/0.0342)	mem 48464MB
[2023-11-10 07:28:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1180/1251]	eta 0:02:50 lr 0.005460	time 2.3954 (2.3969)	model_time 2.3948 (2.3951)	loss 2.8519 (2.7708)	grad_norm 0.4742 (0.4835/0.0348)	mem 48464MB
[2023-11-10 07:28:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1190/1251]	eta 0:02:26 lr 0.005379	time 2.3936 (2.3969)	model_time 2.3932 (2.3951)	loss 2.9027 (2.7701)	grad_norm 0.4925 (0.4826/0.0346)	mem 48464MB
[2023-11-10 07:29:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1200/1251]	eta 0:02:02 lr 0.005298	time 2.3962 (2.3968)	model_time 2.3958 (2.3951)	loss 2.4157 (2.7692)	grad_norm 0.4764 (0.4826/0.0344)	mem 48464MB
[2023-11-10 07:29:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1210/1251]	eta 0:01:38 lr 0.005218	time 2.3950 (2.3968)	model_time 2.3947 (2.3951)	loss 3.2491 (2.7709)	grad_norm 0.5207 (0.4821/0.0346)	mem 48464MB
[2023-11-10 07:30:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1220/1251]	eta 0:01:14 lr 0.005138	time 2.4017 (2.3968)	model_time 2.4012 (2.3951)	loss 2.3003 (2.7701)	grad_norm 0.4348 (0.4825/0.0343)	mem 48464MB
[2023-11-10 07:30:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1230/1251]	eta 0:00:50 lr 0.005059	time 2.3923 (2.3968)	model_time 2.3919 (2.3951)	loss 2.9044 (2.7708)	grad_norm 0.5019 (0.4825/0.0342)	mem 48464MB
[2023-11-10 07:30:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1240/1251]	eta 0:00:26 lr 0.004980	time 2.3936 (2.3968)	model_time 2.3935 (2.3951)	loss 1.6716 (2.7690)	grad_norm 0.4550 (0.4827/0.0344)	mem 48464MB
[2023-11-10 07:31:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [8/10][1250/1251]	eta 0:00:02 lr 0.004902	time 2.3894 (2.3968)	model_time 2.3893 (2.3951)	loss 3.0801 (2.7695)	grad_norm 0.4526 (0.4836/0.0345)	mem 48464MB
[2023-11-10 07:31:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 8 training takes 0:49:58
[2023-11-10 07:31:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_8.pth saving......
[2023-11-10 07:33:01 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_8.pth saved !!!
[2023-11-10 07:33:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.754 (3.754)	Loss 0.5898 (0.5898)	Acc@1 88.281 (88.281)	Acc@5 98.730 (98.730)	Mem 48464MB
[2023-11-10 07:33:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.247 (2.376)	Loss 0.6816 (0.5884)	Acc@1 86.230 (88.459)	Acc@5 98.047 (98.420)	Mem 48464MB
[2023-11-10 07:33:51 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.251 (2.316)	Loss 0.5234 (0.5901)	Acc@1 90.137 (88.342)	Acc@5 98.730 (98.438)	Mem 48464MB
[2023-11-10 07:34:13 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.295)	Loss 0.6504 (0.5953)	Acc@1 86.426 (88.193)	Acc@5 98.145 (98.466)	Mem 48464MB
[2023-11-10 07:34:36 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.253 (2.285)	Loss 0.6567 (0.5976)	Acc@1 86.816 (88.207)	Acc@5 97.852 (98.452)	Mem 48464MB
[2023-11-10 07:34:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:8] * Acc@1 88.234 Acc@5 98.466
[2023-11-10 07:34:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 88.2%
[2023-11-10 07:34:53 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saving......
[2023-11-10 07:36:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_best.pth saved !!!
[2023-11-10 07:36:37 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 88.23%
[2023-11-10 07:36:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.791 (3.791)	Loss 0.5981 (0.5981)	Acc@1 88.379 (88.379)	Acc@5 98.730 (98.730)	Mem 48464MB
[2023-11-10 07:37:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.246 (2.378)	Loss 0.6938 (0.5984)	Acc@1 86.328 (88.485)	Acc@5 97.852 (98.411)	Mem 48464MB
[2023-11-10 07:37:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.254 (2.317)	Loss 0.5332 (0.6000)	Acc@1 90.039 (88.360)	Acc@5 98.730 (98.442)	Mem 48464MB
[2023-11-10 07:37:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.251 (2.296)	Loss 0.6592 (0.6051)	Acc@1 86.621 (88.202)	Acc@5 98.145 (98.475)	Mem 48464MB
[2023-11-10 07:38:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.253 (2.286)	Loss 0.6655 (0.6075)	Acc@1 86.914 (88.191)	Acc@5 97.852 (98.454)	Mem 48464MB
[2023-11-10 07:38:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:8] * Acc@1 88.228 Acc@5 98.472
[2023-11-10 07:38:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 88.2%
[2023-11-10 07:38:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saving......
[2023-11-10 07:40:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth saved !!!
[2023-11-10 07:40:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 88.23%
[2023-11-10 07:40:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][0/1251]	eta 1:20:19 lr 0.004894	time 3.8527 (3.8527)	model_time 2.3982 (2.3982)	loss 2.0861 (2.0861)	grad_norm 0.4375 (0.4375/0.0000)	mem 48464MB
[2023-11-10 07:40:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][10/1251]	eta 0:52:00 lr 0.004817	time 2.3851 (2.5145)	model_time 2.3848 (2.3818)	loss 2.5705 (2.6278)	grad_norm 0.4951 (0.4857/0.0272)	mem 48464MB
[2023-11-10 07:40:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][20/1251]	eta 0:50:24 lr 0.004740	time 2.3942 (2.4569)	model_time 2.3940 (2.3872)	loss 1.5776 (2.5273)	grad_norm 0.4728 (0.4837/0.0284)	mem 48464MB
[2023-11-10 07:41:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][30/1251]	eta 0:49:35 lr 0.004664	time 2.3982 (2.4372)	model_time 2.3979 (2.3898)	loss 2.4480 (2.6158)	grad_norm 0.4613 (0.4876/0.0334)	mem 48464MB
[2023-11-10 07:41:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][40/1251]	eta 0:49:01 lr 0.004589	time 2.4798 (2.4286)	model_time 2.4796 (2.3926)	loss 1.6857 (2.6699)	grad_norm 0.5036 (0.4912/0.0341)	mem 48464MB
[2023-11-10 07:42:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][50/1251]	eta 0:48:31 lr 0.004514	time 2.3944 (2.4241)	model_time 2.3939 (2.3951)	loss 1.6850 (2.7097)	grad_norm 0.4690 (0.4938/0.0354)	mem 48464MB
[2023-11-10 07:42:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][60/1251]	eta 0:48:00 lr 0.004440	time 2.3919 (2.4190)	model_time 2.3915 (2.3946)	loss 2.1304 (2.6918)	grad_norm 0.4759 (0.4899/0.0361)	mem 48464MB
[2023-11-10 07:42:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][70/1251]	eta 0:47:34 lr 0.004366	time 2.5189 (2.4173)	model_time 2.5185 (2.3964)	loss 2.7114 (2.7114)	grad_norm 0.5151 (0.4876/0.0362)	mem 48464MB
[2023-11-10 07:43:23 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][80/1251]	eta 0:47:07 lr 0.004293	time 2.3972 (2.4146)	model_time 2.3970 (2.3961)	loss 3.5249 (2.7261)	grad_norm 0.5100 (0.4859/0.0370)	mem 48464MB
[2023-11-10 07:43:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][90/1251]	eta 0:46:40 lr 0.004220	time 2.3987 (2.4124)	model_time 2.3981 (2.3959)	loss 3.5712 (2.7571)	grad_norm 0.4950 (0.4881/0.0398)	mem 48464MB
[2023-11-10 07:44:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][100/1251]	eta 0:46:14 lr 0.004148	time 2.3924 (2.4106)	model_time 2.3920 (2.3957)	loss 2.9146 (2.7622)	grad_norm 0.5117 (0.4863/0.0391)	mem 48464MB
[2023-11-10 07:44:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][110/1251]	eta 0:45:48 lr 0.004077	time 2.3961 (2.4092)	model_time 2.3958 (2.3956)	loss 2.5474 (2.7589)	grad_norm 0.4491 (0.4868/0.0388)	mem 48464MB
[2023-11-10 07:44:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][120/1251]	eta 0:45:23 lr 0.004006	time 2.3943 (2.4081)	model_time 2.3940 (2.3956)	loss 2.6125 (2.7719)	grad_norm 0.4573 (0.4855/0.0385)	mem 48464MB
[2023-11-10 07:45:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][130/1251]	eta 0:44:58 lr 0.003936	time 2.3968 (2.4071)	model_time 2.3965 (2.3956)	loss 2.8742 (2.7587)	grad_norm 0.5409 (0.4853/0.0380)	mem 48464MB
[2023-11-10 07:45:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][140/1251]	eta 0:44:33 lr 0.003867	time 2.3947 (2.4063)	model_time 2.3943 (2.3955)	loss 3.8281 (2.7654)	grad_norm 0.5961 (0.4851/0.0386)	mem 48464MB
[2023-11-10 07:46:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][150/1251]	eta 0:44:08 lr 0.003798	time 2.3950 (2.4055)	model_time 2.3945 (2.3954)	loss 1.4598 (2.7369)	grad_norm 0.4586 (0.4852/0.0376)	mem 48464MB
[2023-11-10 07:46:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][160/1251]	eta 0:43:43 lr 0.003730	time 2.3931 (2.4048)	model_time 2.3928 (2.3953)	loss 2.5360 (2.7682)	grad_norm 0.5070 (0.4870/0.0382)	mem 48464MB
[2023-11-10 07:46:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][170/1251]	eta 0:43:18 lr 0.003662	time 2.3925 (2.4042)	model_time 2.3922 (2.3953)	loss 2.9597 (2.7628)	grad_norm 0.5362 (0.4874/0.0384)	mem 48464MB
[2023-11-10 07:47:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][180/1251]	eta 0:42:54 lr 0.003595	time 2.3937 (2.4038)	model_time 2.3933 (2.3953)	loss 2.5232 (2.7644)	grad_norm 0.5034 (0.4876/0.0381)	mem 48464MB
[2023-11-10 07:47:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][190/1251]	eta 0:42:29 lr 0.003529	time 2.3951 (2.4034)	model_time 2.3948 (2.3953)	loss 3.4593 (2.7621)	grad_norm 0.4873 (0.4880/0.0381)	mem 48464MB
[2023-11-10 07:48:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][200/1251]	eta 0:42:05 lr 0.003463	time 2.3950 (2.4031)	model_time 2.3947 (2.3953)	loss 2.8699 (2.7655)	grad_norm 0.4228 (0.4885/0.0399)	mem 48464MB
[2023-11-10 07:48:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][210/1251]	eta 0:41:41 lr 0.003398	time 2.3906 (2.4027)	model_time 2.3903 (2.3953)	loss 1.9667 (2.7640)	grad_norm 0.4313 (0.4882/0.0397)	mem 48464MB
[2023-11-10 07:48:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][220/1251]	eta 0:41:16 lr 0.003333	time 2.3904 (2.4023)	model_time 2.3900 (2.3952)	loss 3.0962 (2.7647)	grad_norm 0.5638 (0.4891/0.0414)	mem 48464MB
[2023-11-10 07:49:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][230/1251]	eta 0:40:52 lr 0.003269	time 2.3949 (2.4021)	model_time 2.3945 (2.3953)	loss 3.0249 (2.7773)	grad_norm 0.5401 (0.4899/0.0418)	mem 48464MB
[2023-11-10 07:49:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][240/1251]	eta 0:40:28 lr 0.003206	time 2.3934 (2.4017)	model_time 2.3931 (2.3952)	loss 2.7884 (2.7904)	grad_norm 0.4732 (0.4897/0.0415)	mem 48464MB
[2023-11-10 07:50:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][250/1251]	eta 0:40:03 lr 0.003143	time 2.3973 (2.4015)	model_time 2.3969 (2.3953)	loss 2.5107 (2.7993)	grad_norm 0.4011 (0.4895/0.0413)	mem 48464MB
[2023-11-10 07:50:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][260/1251]	eta 0:39:39 lr 0.003081	time 2.3963 (2.4013)	model_time 2.3960 (2.3952)	loss 1.6741 (2.7934)	grad_norm 0.4696 (0.4893/0.0407)	mem 48464MB
[2023-11-10 07:50:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][270/1251]	eta 0:39:15 lr 0.003019	time 2.3929 (2.4011)	model_time 2.3924 (2.3952)	loss 1.8582 (2.7849)	grad_norm 0.4843 (0.4886/0.0405)	mem 48464MB
[2023-11-10 07:51:22 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][280/1251]	eta 0:38:51 lr 0.002958	time 2.3977 (2.4009)	model_time 2.3974 (2.3953)	loss 3.3288 (2.7880)	grad_norm 0.4854 (0.4890/0.0404)	mem 48464MB
[2023-11-10 07:51:46 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][290/1251]	eta 0:38:27 lr 0.002898	time 2.3919 (2.4007)	model_time 2.3914 (2.3952)	loss 3.2383 (2.7933)	grad_norm 0.5169 (0.4891/0.0403)	mem 48464MB
[2023-11-10 07:52:10 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][300/1251]	eta 0:38:02 lr 0.002838	time 2.3949 (2.4006)	model_time 2.3946 (2.3953)	loss 1.7902 (2.7846)	grad_norm 0.4916 (0.4891/0.0401)	mem 48464MB
[2023-11-10 07:52:34 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][310/1251]	eta 0:37:38 lr 0.002779	time 2.3972 (2.4005)	model_time 2.3968 (2.3953)	loss 2.3012 (2.7852)	grad_norm 0.4809 (0.4889/0.0404)	mem 48464MB
[2023-11-10 07:52:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][320/1251]	eta 0:37:14 lr 0.002721	time 2.3948 (2.4003)	model_time 2.3946 (2.3953)	loss 3.1183 (2.7813)	grad_norm 0.4814 (0.4891/0.0404)	mem 48464MB
[2023-11-10 07:53:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][330/1251]	eta 0:36:50 lr 0.002663	time 2.3940 (2.4001)	model_time 2.3937 (2.3953)	loss 1.9047 (2.7737)	grad_norm 0.4469 (0.4883/0.0401)	mem 48464MB
[2023-11-10 07:53:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][340/1251]	eta 0:36:26 lr 0.002606	time 2.3904 (2.4000)	model_time 2.3900 (2.3952)	loss 2.7359 (2.7754)	grad_norm 0.4501 (0.4876/0.0398)	mem 48464MB
[2023-11-10 07:54:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][350/1251]	eta 0:36:02 lr 0.002549	time 2.3952 (2.3998)	model_time 2.3944 (2.3952)	loss 2.9892 (2.7734)	grad_norm 0.5134 (0.4877/0.0403)	mem 48464MB
[2023-11-10 07:54:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][360/1251]	eta 0:35:38 lr 0.002493	time 2.3977 (2.3998)	model_time 2.3971 (2.3953)	loss 3.4052 (2.7743)	grad_norm 0.4926 (0.4876/0.0400)	mem 48464MB
[2023-11-10 07:54:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][370/1251]	eta 0:35:14 lr 0.002437	time 2.3922 (2.3996)	model_time 2.3920 (2.3952)	loss 2.5003 (2.7657)	grad_norm 0.4929 (0.4877/0.0402)	mem 48464MB
[2023-11-10 07:55:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][380/1251]	eta 0:34:50 lr 0.002383	time 2.3959 (2.3998)	model_time 2.3956 (2.3955)	loss 3.4050 (2.7664)	grad_norm 0.4898 (0.4875/0.0400)	mem 48464MB
[2023-11-10 07:55:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][390/1251]	eta 0:34:26 lr 0.002328	time 2.3931 (2.3997)	model_time 2.3924 (2.3955)	loss 3.1275 (2.7662)	grad_norm 0.4906 (0.4873/0.0390)	mem 48464MB
[2023-11-10 07:56:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][400/1251]	eta 0:34:02 lr 0.002275	time 2.3930 (2.3996)	model_time 2.3928 (2.3955)	loss 2.1969 (2.7667)	grad_norm 0.4842 (0.4884/0.0392)	mem 48464MB
[2023-11-10 07:56:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][410/1251]	eta 0:33:37 lr 0.002222	time 2.3908 (2.3995)	model_time 2.3899 (2.3954)	loss 2.1559 (2.7669)	grad_norm 0.4513 (0.4880/0.0395)	mem 48464MB
[2023-11-10 07:56:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][420/1251]	eta 0:33:13 lr 0.002170	time 2.3910 (2.3993)	model_time 2.3905 (2.3954)	loss 1.9228 (2.7700)	grad_norm 0.4359 (0.4885/0.0394)	mem 48464MB
[2023-11-10 07:57:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][430/1251]	eta 0:32:49 lr 0.002118	time 2.3991 (2.3993)	model_time 2.3988 (2.3954)	loss 2.3694 (2.7668)	grad_norm 0.4489 (0.4885/0.0394)	mem 48464MB
[2023-11-10 07:57:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][440/1251]	eta 0:32:25 lr 0.002067	time 2.3919 (2.3992)	model_time 2.3916 (2.3954)	loss 2.8787 (2.7704)	grad_norm 0.5339 (0.4886/0.0388)	mem 48464MB
[2023-11-10 07:58:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][450/1251]	eta 0:32:01 lr 0.002016	time 2.3912 (2.3991)	model_time 2.3910 (2.3954)	loss 2.6762 (2.7684)	grad_norm 0.4251 (0.4882/0.0390)	mem 48464MB
[2023-11-10 07:58:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][460/1251]	eta 0:31:37 lr 0.001966	time 2.3947 (2.3990)	model_time 2.3945 (2.3954)	loss 1.5950 (2.7595)	grad_norm 0.4616 (0.4864/0.0387)	mem 48464MB
[2023-11-10 07:58:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][470/1251]	eta 0:31:13 lr 0.001917	time 2.3922 (2.3989)	model_time 2.3919 (2.3953)	loss 2.9777 (2.7569)	grad_norm 0.4723 (0.4861/0.0386)	mem 48464MB
[2023-11-10 07:59:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][480/1251]	eta 0:30:49 lr 0.001869	time 2.3916 (2.3988)	model_time 2.3912 (2.3953)	loss 3.5614 (2.7560)	grad_norm 0.5717 (0.4862/0.0390)	mem 48464MB
[2023-11-10 07:59:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][490/1251]	eta 0:30:25 lr 0.001821	time 2.3936 (2.3987)	model_time 2.3931 (2.3953)	loss 1.4879 (2.7507)	grad_norm 0.4478 (0.4860/0.0390)	mem 48464MB
[2023-11-10 08:00:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][500/1251]	eta 0:30:01 lr 0.001773	time 2.3960 (2.3987)	model_time 2.3958 (2.3953)	loss 3.1991 (2.7510)	grad_norm 0.4977 (0.4859/0.0375)	mem 48464MB
[2023-11-10 08:00:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][510/1251]	eta 0:29:37 lr 0.001726	time 2.3940 (2.3985)	model_time 2.3937 (2.3952)	loss 2.9046 (2.7519)	grad_norm 0.5034 (0.4861/0.0373)	mem 48464MB
[2023-11-10 08:00:57 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][520/1251]	eta 0:29:13 lr 0.001680	time 2.3934 (2.3985)	model_time 2.3930 (2.3952)	loss 2.6468 (2.7520)	grad_norm 0.4821 (0.4852/0.0355)	mem 48464MB
[2023-11-10 08:01:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][530/1251]	eta 0:28:49 lr 0.001635	time 2.3995 (2.3984)	model_time 2.3991 (2.3952)	loss 1.9511 (2.7508)	grad_norm 0.4517 (0.4843/0.0348)	mem 48464MB
[2023-11-10 08:01:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][540/1251]	eta 0:28:25 lr 0.001590	time 2.3964 (2.3983)	model_time 2.3957 (2.3952)	loss 2.8835 (2.7484)	grad_norm 0.4299 (0.4836/0.0351)	mem 48464MB
[2023-11-10 08:02:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][550/1251]	eta 0:28:01 lr 0.001546	time 2.3920 (2.3985)	model_time 2.3916 (2.3954)	loss 3.0272 (2.7511)	grad_norm 0.4972 (0.4831/0.0348)	mem 48464MB
[2023-11-10 08:02:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][560/1251]	eta 0:27:37 lr 0.001502	time 2.3924 (2.3984)	model_time 2.3922 (2.3954)	loss 1.5474 (2.7521)	grad_norm 0.4590 (0.4831/0.0350)	mem 48464MB
[2023-11-10 08:02:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][570/1251]	eta 0:27:13 lr 0.001459	time 2.3913 (2.3984)	model_time 2.3909 (2.3954)	loss 2.2852 (2.7519)	grad_norm 0.4554 (0.4844/0.0357)	mem 48464MB
[2023-11-10 08:03:21 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][580/1251]	eta 0:26:49 lr 0.001416	time 2.3975 (2.3987)	model_time 2.3973 (2.3957)	loss 2.8162 (2.7524)	grad_norm 0.4542 (0.4840/0.0354)	mem 48464MB
[2023-11-10 08:03:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][590/1251]	eta 0:26:25 lr 0.001375	time 2.3978 (2.3987)	model_time 2.3976 (2.3957)	loss 3.4450 (2.7551)	grad_norm 0.5084 (0.4841/0.0359)	mem 48464MB
[2023-11-10 08:04:09 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][600/1251]	eta 0:26:01 lr 0.001333	time 2.3950 (2.3986)	model_time 2.3948 (2.3957)	loss 2.3480 (2.7531)	grad_norm 0.4731 (0.4848/0.0362)	mem 48464MB
[2023-11-10 08:04:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][610/1251]	eta 0:25:37 lr 0.001293	time 2.3974 (2.3986)	model_time 2.3969 (2.3957)	loss 3.0348 (2.7526)	grad_norm 0.5278 (0.4849/0.0359)	mem 48464MB
[2023-11-10 08:04:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][620/1251]	eta 0:25:13 lr 0.001253	time 2.3963 (2.3985)	model_time 2.3959 (2.3957)	loss 3.6093 (2.7553)	grad_norm 0.4970 (0.4854/0.0365)	mem 48464MB
[2023-11-10 08:05:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][630/1251]	eta 0:24:49 lr 0.001214	time 2.3961 (2.3985)	model_time 2.3958 (2.3957)	loss 3.5587 (2.7506)	grad_norm 0.5568 (0.4854/0.0371)	mem 48464MB
[2023-11-10 08:05:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][640/1251]	eta 0:24:25 lr 0.001175	time 2.3959 (2.3985)	model_time 2.3956 (2.3957)	loss 3.6078 (2.7560)	grad_norm 0.5481 (0.4860/0.0379)	mem 48464MB
[2023-11-10 08:06:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][650/1251]	eta 0:24:01 lr 0.001137	time 2.3980 (2.3984)	model_time 2.3976 (2.3957)	loss 2.4949 (2.7566)	grad_norm 0.4935 (0.4851/0.0368)	mem 48464MB
[2023-11-10 08:06:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][660/1251]	eta 0:23:37 lr 0.001099	time 2.3980 (2.3984)	model_time 2.3975 (2.3957)	loss 2.6580 (2.7568)	grad_norm 0.4743 (0.4856/0.0368)	mem 48464MB
[2023-11-10 08:06:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][670/1251]	eta 0:23:13 lr 0.001063	time 2.3924 (2.3984)	model_time 2.3921 (2.3957)	loss 2.2005 (2.7578)	grad_norm 0.5694 (0.4866/0.0369)	mem 48464MB
[2023-11-10 08:07:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][680/1251]	eta 0:22:49 lr 0.001026	time 2.3930 (2.3983)	model_time 2.3928 (2.3957)	loss 2.4459 (2.7546)	grad_norm 0.4713 (0.4878/0.0366)	mem 48464MB
[2023-11-10 08:07:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][690/1251]	eta 0:22:25 lr 0.000991	time 2.3976 (2.3983)	model_time 2.3973 (2.3957)	loss 3.1398 (2.7569)	grad_norm 0.5191 (0.4877/0.0365)	mem 48464MB
[2023-11-10 08:08:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][700/1251]	eta 0:22:01 lr 0.000956	time 2.3933 (2.3982)	model_time 2.3929 (2.3956)	loss 3.2999 (2.7549)	grad_norm 0.5367 (0.4868/0.0363)	mem 48464MB
[2023-11-10 08:08:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][710/1251]	eta 0:21:37 lr 0.000921	time 2.3950 (2.3981)	model_time 2.3948 (2.3956)	loss 2.2302 (2.7551)	grad_norm 0.4486 (0.4875/0.0365)	mem 48464MB
[2023-11-10 08:08:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][720/1251]	eta 0:21:13 lr 0.000888	time 2.3934 (2.3981)	model_time 2.3930 (2.3956)	loss 3.0054 (2.7556)	grad_norm 0.4685 (0.4868/0.0364)	mem 48464MB
[2023-11-10 08:09:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][730/1251]	eta 0:20:49 lr 0.000855	time 2.3977 (2.3980)	model_time 2.3972 (2.3956)	loss 1.6213 (2.7527)	grad_norm 0.4453 (0.4866/0.0369)	mem 48464MB
[2023-11-10 08:09:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][740/1251]	eta 0:20:25 lr 0.000822	time 2.3916 (2.3980)	model_time 2.3912 (2.3956)	loss 2.9387 (2.7512)	grad_norm 0.5332 (0.4863/0.0373)	mem 48464MB
[2023-11-10 08:10:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][750/1251]	eta 0:20:01 lr 0.000790	time 2.3959 (2.3981)	model_time 2.3953 (2.3957)	loss 2.4610 (2.7530)	grad_norm 0.4797 (0.4865/0.0370)	mem 48464MB
[2023-11-10 08:10:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][760/1251]	eta 0:19:37 lr 0.000759	time 2.3933 (2.3981)	model_time 2.3928 (2.3957)	loss 2.5454 (2.7526)	grad_norm 0.5025 (0.4874/0.0367)	mem 48464MB
[2023-11-10 08:10:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][770/1251]	eta 0:19:13 lr 0.000729	time 2.3929 (2.3980)	model_time 2.3926 (2.3957)	loss 2.4187 (2.7522)	grad_norm 0.5118 (0.4881/0.0368)	mem 48464MB
[2023-11-10 08:11:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][780/1251]	eta 0:18:49 lr 0.000699	time 2.3928 (2.3980)	model_time 2.3924 (2.3956)	loss 1.7843 (2.7528)	grad_norm 0.4179 (0.4872/0.0363)	mem 48464MB
[2023-11-10 08:11:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][790/1251]	eta 0:18:25 lr 0.000669	time 2.3928 (2.3979)	model_time 2.3924 (2.3956)	loss 3.3263 (2.7540)	grad_norm 0.4457 (0.4868/0.0365)	mem 48464MB
[2023-11-10 08:12:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][800/1251]	eta 0:18:01 lr 0.000641	time 2.3935 (2.3981)	model_time 2.3931 (2.3958)	loss 1.5868 (2.7539)	grad_norm 0.4560 (0.4863/0.0364)	mem 48464MB
[2023-11-10 08:12:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][810/1251]	eta 0:17:37 lr 0.000613	time 2.3957 (2.3981)	model_time 2.3953 (2.3958)	loss 3.5102 (2.7573)	grad_norm 0.5019 (0.4859/0.0363)	mem 48464MB
[2023-11-10 08:12:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][820/1251]	eta 0:17:13 lr 0.000585	time 2.3945 (2.3981)	model_time 2.3942 (2.3958)	loss 2.7234 (2.7577)	grad_norm 0.4820 (0.4860/0.0364)	mem 48464MB
[2023-11-10 08:13:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][830/1251]	eta 0:16:49 lr 0.000558	time 2.3894 (2.3980)	model_time 2.3888 (2.3958)	loss 1.8625 (2.7569)	grad_norm 0.4522 (0.4860/0.0363)	mem 48464MB
[2023-11-10 08:13:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][840/1251]	eta 0:16:25 lr 0.000532	time 2.3914 (2.3980)	model_time 2.3911 (2.3958)	loss 2.7148 (2.7528)	grad_norm 0.4553 (0.4864/0.0358)	mem 48464MB
[2023-11-10 08:14:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][850/1251]	eta 0:16:01 lr 0.000507	time 2.3980 (2.3981)	model_time 2.3978 (2.3959)	loss 2.9178 (2.7519)	grad_norm 0.4788 (0.4865/0.0360)	mem 48464MB
[2023-11-10 08:14:32 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][860/1251]	eta 0:15:37 lr 0.000482	time 2.3947 (2.3981)	model_time 2.3944 (2.3959)	loss 2.8396 (2.7509)	grad_norm 0.4804 (0.4861/0.0361)	mem 48464MB
[2023-11-10 08:14:56 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][870/1251]	eta 0:15:13 lr 0.000457	time 2.3949 (2.3980)	model_time 2.3941 (2.3958)	loss 2.7322 (2.7487)	grad_norm 0.5461 (0.4854/0.0360)	mem 48464MB
[2023-11-10 08:15:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][880/1251]	eta 0:14:49 lr 0.000434	time 2.3944 (2.3980)	model_time 2.3941 (2.3958)	loss 2.7236 (2.7461)	grad_norm 0.4460 (0.4844/0.0361)	mem 48464MB
[2023-11-10 08:15:44 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][890/1251]	eta 0:14:25 lr 0.000411	time 2.3950 (2.3980)	model_time 2.3947 (2.3958)	loss 3.1322 (2.7469)	grad_norm 0.4509 (0.4844/0.0354)	mem 48464MB
[2023-11-10 08:16:08 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][900/1251]	eta 0:14:01 lr 0.000388	time 2.3917 (2.3979)	model_time 2.3914 (2.3958)	loss 3.5655 (2.7463)	grad_norm 0.4883 (0.4837/0.0351)	mem 48464MB
[2023-11-10 08:16:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][910/1251]	eta 0:13:37 lr 0.000366	time 2.3947 (2.3979)	model_time 2.3943 (2.3958)	loss 2.8868 (2.7474)	grad_norm 0.5319 (0.4840/0.0355)	mem 48464MB
[2023-11-10 08:16:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][920/1251]	eta 0:13:13 lr 0.000345	time 2.3950 (2.3978)	model_time 2.3947 (2.3957)	loss 2.8554 (2.7460)	grad_norm 0.5006 (0.4833/0.0351)	mem 48464MB
[2023-11-10 08:17:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][930/1251]	eta 0:12:49 lr 0.000325	time 2.3923 (2.3978)	model_time 2.3920 (2.3957)	loss 2.9789 (2.7434)	grad_norm 0.5011 (0.4833/0.0343)	mem 48464MB
[2023-11-10 08:17:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][940/1251]	eta 0:12:25 lr 0.000305	time 2.3928 (2.3977)	model_time 2.3925 (2.3957)	loss 1.6828 (2.7398)	grad_norm 0.4159 (0.4826/0.0338)	mem 48464MB
[2023-11-10 08:18:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][950/1251]	eta 0:12:01 lr 0.000286	time 2.3981 (2.3977)	model_time 2.3976 (2.3957)	loss 3.3116 (2.7390)	grad_norm 0.5527 (0.4833/0.0342)	mem 48464MB
[2023-11-10 08:18:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][960/1251]	eta 0:11:37 lr 0.000267	time 2.3920 (2.3977)	model_time 2.3917 (2.3956)	loss 2.2788 (2.7378)	grad_norm 0.4529 (0.4833/0.0343)	mem 48464MB
[2023-11-10 08:18:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][970/1251]	eta 0:11:13 lr 0.000249	time 2.3924 (2.3976)	model_time 2.3918 (2.3956)	loss 3.1312 (2.7385)	grad_norm 0.5011 (0.4823/0.0336)	mem 48464MB
[2023-11-10 08:19:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][980/1251]	eta 0:10:49 lr 0.000231	time 2.3952 (2.3976)	model_time 2.3948 (2.3956)	loss 2.8271 (2.7385)	grad_norm 0.5478 (0.4819/0.0340)	mem 48464MB
[2023-11-10 08:19:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][990/1251]	eta 0:10:25 lr 0.000215	time 2.3975 (2.3976)	model_time 2.3971 (2.3956)	loss 3.3040 (2.7389)	grad_norm 0.4871 (0.4818/0.0342)	mem 48464MB
[2023-11-10 08:20:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1000/1251]	eta 0:10:01 lr 0.000199	time 2.3940 (2.3976)	model_time 2.3936 (2.3956)	loss 3.0521 (2.7397)	grad_norm 0.5421 (0.4822/0.0343)	mem 48464MB
[2023-11-10 08:20:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1010/1251]	eta 0:09:37 lr 0.000183	time 2.3914 (2.3976)	model_time 2.3910 (2.3956)	loss 1.3879 (2.7393)	grad_norm 0.4461 (0.4812/0.0335)	mem 48464MB
[2023-11-10 08:20:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1020/1251]	eta 0:09:13 lr 0.000168	time 2.3946 (2.3977)	model_time 2.3941 (2.3957)	loss 3.5783 (2.7386)	grad_norm 0.4743 (0.4819/0.0334)	mem 48464MB
[2023-11-10 08:21:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1030/1251]	eta 0:08:49 lr 0.000154	time 2.3920 (2.3976)	model_time 2.3917 (2.3957)	loss 2.7735 (2.7416)	grad_norm 0.5173 (0.4828/0.0340)	mem 48464MB
[2023-11-10 08:21:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1040/1251]	eta 0:08:25 lr 0.000140	time 2.3918 (2.3976)	model_time 2.3916 (2.3957)	loss 3.6735 (2.7428)	grad_norm 0.5061 (0.4825/0.0338)	mem 48464MB
[2023-11-10 08:22:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1050/1251]	eta 0:08:01 lr 0.000127	time 2.4988 (2.3976)	model_time 2.4986 (2.3958)	loss 2.5913 (2.7419)	grad_norm 0.4585 (0.4828/0.0341)	mem 48464MB
[2023-11-10 08:22:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1060/1251]	eta 0:07:37 lr 0.000115	time 2.3908 (2.3976)	model_time 2.3903 (2.3958)	loss 2.9159 (2.7425)	grad_norm 0.4213 (0.4824/0.0342)	mem 48464MB
[2023-11-10 08:22:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1070/1251]	eta 0:07:13 lr 0.000103	time 2.3979 (2.3976)	model_time 2.3974 (2.3957)	loss 2.8224 (2.7434)	grad_norm 0.5182 (0.4816/0.0341)	mem 48464MB
[2023-11-10 08:23:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1080/1251]	eta 0:06:49 lr 0.000092	time 2.3963 (2.3976)	model_time 2.3959 (2.3957)	loss 2.8931 (2.7421)	grad_norm 0.4993 (0.4824/0.0341)	mem 48464MB
[2023-11-10 08:23:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1090/1251]	eta 0:06:26 lr 0.000082	time 2.3967 (2.3976)	model_time 2.3960 (2.3957)	loss 2.7488 (2.7427)	grad_norm 0.4401 (0.4828/0.0338)	mem 48464MB
[2023-11-10 08:24:07 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1100/1251]	eta 0:06:02 lr 0.000072	time 2.3976 (2.3975)	model_time 2.3970 (2.3957)	loss 2.6834 (2.7410)	grad_norm 0.5249 (0.4831/0.0338)	mem 48464MB
[2023-11-10 08:24:31 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1110/1251]	eta 0:05:38 lr 0.000063	time 2.3928 (2.3975)	model_time 2.3924 (2.3957)	loss 1.4461 (2.7363)	grad_norm 0.5031 (0.4835/0.0344)	mem 48464MB
[2023-11-10 08:24:55 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1120/1251]	eta 0:05:14 lr 0.000054	time 2.3915 (2.3975)	model_time 2.3912 (2.3957)	loss 2.9753 (2.7361)	grad_norm 0.5323 (0.4833/0.0343)	mem 48464MB
[2023-11-10 08:25:19 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1130/1251]	eta 0:04:50 lr 0.000046	time 2.3972 (2.3975)	model_time 2.3969 (2.3957)	loss 2.3496 (2.7357)	grad_norm 0.4442 (0.4831/0.0346)	mem 48464MB
[2023-11-10 08:25:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1140/1251]	eta 0:04:26 lr 0.000039	time 2.3940 (2.3975)	model_time 2.3935 (2.3957)	loss 2.3014 (2.7337)	grad_norm 0.4820 (0.4830/0.0347)	mem 48464MB
[2023-11-10 08:26:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1150/1251]	eta 0:04:02 lr 0.000032	time 2.3964 (2.3974)	model_time 2.3961 (2.3957)	loss 3.3392 (2.7354)	grad_norm 0.4418 (0.4834/0.0346)	mem 48464MB
[2023-11-10 08:26:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1160/1251]	eta 0:03:38 lr 0.000026	time 2.3940 (2.3974)	model_time 2.3937 (2.3957)	loss 3.7036 (2.7335)	grad_norm 0.5612 (0.4834/0.0348)	mem 48464MB
[2023-11-10 08:26:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1170/1251]	eta 0:03:14 lr 0.000021	time 2.3966 (2.3974)	model_time 2.3961 (2.3957)	loss 2.1708 (2.7326)	grad_norm 0.4826 (0.4833/0.0339)	mem 48464MB
[2023-11-10 08:27:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1180/1251]	eta 0:02:50 lr 0.000016	time 2.3951 (2.3974)	model_time 2.3945 (2.3957)	loss 2.4054 (2.7309)	grad_norm 0.4880 (0.4839/0.0337)	mem 48464MB
[2023-11-10 08:27:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1190/1251]	eta 0:02:26 lr 0.000012	time 2.3920 (2.3974)	model_time 2.3912 (2.3956)	loss 1.8306 (2.7303)	grad_norm 0.4371 (0.4840/0.0343)	mem 48464MB
[2023-11-10 08:28:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1200/1251]	eta 0:02:02 lr 0.000008	time 2.3954 (2.3973)	model_time 2.3950 (2.3956)	loss 2.0558 (2.7291)	grad_norm 0.4875 (0.4839/0.0341)	mem 48464MB
[2023-11-10 08:28:30 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1210/1251]	eta 0:01:38 lr 0.000005	time 2.3892 (2.3973)	model_time 2.3888 (2.3956)	loss 3.0774 (2.7316)	grad_norm 0.5087 (0.4835/0.0337)	mem 48464MB
[2023-11-10 08:28:54 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1220/1251]	eta 0:01:14 lr 0.000003	time 2.3938 (2.3974)	model_time 2.3935 (2.3957)	loss 2.7977 (2.7315)	grad_norm 0.4759 (0.4832/0.0334)	mem 48464MB
[2023-11-10 08:29:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1230/1251]	eta 0:00:50 lr 0.000001	time 2.3937 (2.3974)	model_time 2.3934 (2.3957)	loss 3.2056 (2.7314)	grad_norm 0.4442 (0.4837/0.0338)	mem 48464MB
[2023-11-10 08:29:42 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1240/1251]	eta 0:00:26 lr 0.000000	time 2.3913 (2.3973)	model_time 2.3911 (2.3957)	loss 1.6650 (2.7313)	grad_norm 0.4974 (0.4840/0.0344)	mem 48464MB
[2023-11-10 08:30:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 506): INFO Train: [9/10][1250/1251]	eta 0:00:02 lr 0.000000	time 2.3926 (2.3973)	model_time 2.3925 (2.3956)	loss 3.1776 (2.7320)	grad_norm 0.5001 (0.4838/0.0341)	mem 48464MB
[2023-11-10 08:30:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 515): INFO EPOCH 9 training takes 0:49:59
[2023-11-10 08:30:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 357): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_9.pth saving......
[2023-11-10 08:32:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 359): INFO work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_9.pth saved !!!
[2023-11-10 08:32:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 3.771 (3.771)	Loss 0.6060 (0.6060)	Acc@1 88.281 (88.281)	Acc@5 98.730 (98.730)	Mem 48464MB
[2023-11-10 08:32:27 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.248 (2.377)	Loss 0.6997 (0.6056)	Acc@1 86.426 (88.397)	Acc@5 98.047 (98.420)	Mem 48464MB
[2023-11-10 08:32:50 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.251 (2.317)	Loss 0.5410 (0.6071)	Acc@1 90.137 (88.305)	Acc@5 98.730 (98.447)	Mem 48464MB
[2023-11-10 08:33:12 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.296)	Loss 0.6655 (0.6123)	Acc@1 86.621 (88.146)	Acc@5 98.145 (98.478)	Mem 48464MB
[2023-11-10 08:33:35 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.251 (2.285)	Loss 0.6733 (0.6147)	Acc@1 86.816 (88.162)	Acc@5 97.852 (98.447)	Mem 48464MB
[2023-11-10 08:33:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:9] * Acc@1 88.192 Acc@5 98.460
[2023-11-10 08:33:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 339): INFO Accuracy of the network on the 50000 test images: 88.2%
[2023-11-10 08:33:52 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 354): INFO Max accuracy: 88.23%
[2023-11-10 08:33:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 5.438 (5.438)	Loss 0.5991 (0.5991)	Acc@1 88.281 (88.281)	Acc@5 98.730 (98.730)	Mem 48464MB
[2023-11-10 08:34:20 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.254 (2.540)	Loss 0.6929 (0.5989)	Acc@1 86.328 (88.406)	Acc@5 98.047 (98.420)	Mem 48464MB
[2023-11-10 08:34:43 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.253 (2.404)	Loss 0.5337 (0.6004)	Acc@1 90.137 (88.318)	Acc@5 98.730 (98.451)	Mem 48464MB
[2023-11-10 08:35:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.252 (2.355)	Loss 0.6592 (0.6054)	Acc@1 86.621 (88.171)	Acc@5 98.145 (98.472)	Mem 48464MB
[2023-11-10 08:35:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.253 (2.330)	Loss 0.6665 (0.6077)	Acc@1 86.914 (88.181)	Acc@5 97.754 (98.445)	Mem 48464MB
[2023-11-10 08:35:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 575): INFO [Epoch:9] * Acc@1 88.202 Acc@5 98.460
[2023-11-10 08:35:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 359): INFO Accuracy of the ema network on the 50000 test images: 88.2%
[2023-11-10 08:35:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 374): INFO Max ema accuracy: 88.23%
[2023-11-10 08:35:45 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 378): INFO Training time 9:44:56
[2023-11-10 11:25:58 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 663): INFO Full config saved to work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/config.json
[2023-11-10 11:25:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 666): INFO AMP_OPT_LEVEL: O1
AMP_TYPE: float16
AUG:
  AUTO_AUGMENT: rand-m9-mstd0.5-inc1
  COLOR_JITTER: 0.4
  CUTMIX: 1.0
  CUTMIX_MINMAX: null
  MEAN:
  - 0.485
  - 0.456
  - 0.406
  MIXUP: 0.8
  MIXUP_MODE: batch
  MIXUP_PROB: 1.0
  MIXUP_SWITCH_PROB: 0.5
  RANDOM_RESIZED_CROP: false
  RECOUNT: 1
  REMODE: pixel
  REPROB: 0.25
  STD:
  - 0.229
  - 0.224
  - 0.225
BASE:
- ''
DATA:
  BATCH_SIZE: 128
  CACHE_MODE: part
  DATASET: imagenet
  DATA_PATH: /mnt/petrelfs/share/images
  IMG_ON_MEMORY: false
  IMG_SIZE: 224
  INTERPOLATION: bicubic
  NUM_WORKERS: 8
  PIN_MEMORY: true
  TRANSFORM: build_transform_for_linear_probe
  ZIP_MODE: false
EVAL_22K_TO_1K: false
EVAL_FREQ: 1
EVAL_MODE: true
LOCAL_RANK: 0
MODEL:
  DROP_PATH_RATE: 0.0
  DROP_PATH_TYPE: linear
  DROP_RATE: 0.0
  INTERN_IMAGE:
    CENTER_FEATURE_SCALE: false
    CHANNELS: 64
    CORE_OP: DCNv3
    DEPTHS:
    - 4
    - 4
    - 18
    - 4
    DW_KERNEL_SIZE: null
    GROUPS:
    - 4
    - 8
    - 16
    - 32
    LAYER_SCALE: null
    LEVEL2_POST_NORM: false
    LEVEL2_POST_NORM_BLOCK_IDS: null
    MLP_RATIO: 4.0
    OFFSET_SCALE: 1.0
    POST_NORM: false
    REMOVE_CENTER: false
    RES_POST_NORM: false
    USE_CLIP_PROJECTOR: false
  INTERN_VIT_6B:
    CLS_TARGET: cls_patch_concat
    DEPTH: 48
    EMBED_DIM: 3200
    FREEZE_VIT: true
    INIT_VALUES: 0.1
    MLP_RATIO: 4
    NUM_HEADS: 25
    OUT_INDICES:
    - 47
    PATCH_SIZE: 14
    PRETRAINED: ./pretrained/intern_vit_6b_224px.pth
    PRETRAIN_SIZE: 224
    QKV_BIAS: false
    QK_NORMALIZATION: true
    USE_FLASH_ATTN: true
  LABEL_SMOOTHING: 0.1
  NAME: intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
  NUM_CLASSES: 1000
  PRETRAINED: ''
  RESUME: work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth
  TYPE: intern_vit_6b
OUTPUT: work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
PRINT_FREQ: 10
SAVE_CKPT_NUM: 1
SAVE_FREQ: 1
SEED: 0
TAG: default
TEST:
  CROP: true
  SEQUENTIAL: false
THROUGHPUT_MODE: false
TRAIN:
  ACCUMULATION_STEPS: 1
  AUTO_RESUME: true
  BASE_LR: 0.2
  CLIP_GRAD: 5.0
  EMA:
    DECAY: 0.998
    ENABLE: true
  EPOCHS: 10
  LR_LAYER_DECAY: false
  LR_LAYER_DECAY_RATIO: 0.875
  LR_SCHEDULER:
    DECAY_EPOCHS: 30
    DECAY_RATE: 0.1
    NAME: cosine
  MIN_LR: 0.0
  OPTIMIZER:
    BETAS:
    - 0.9
    - 0.999
    DCN_LR_MUL: null
    EPS: 1.0e-08
    FREEZE_BACKBONE: null
    MOMENTUM: 0.9
    NAME: sgd
    USE_ZERO: false
  RAND_INIT_FT_HEAD: false
  START_EPOCH: 0
  USE_CHECKPOINT: false
  WARMUP_EPOCHS: 1
  WARMUP_LR: 0.0
  WEIGHT_DECAY: 0.0

[2023-11-10 11:25:59 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 173): INFO Creating model:intern_vit_6b/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1
[2023-11-10 11:27:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 176): INFO InternViT6B(
  (patch_embed): PatchEmbed(
    (proj): Conv2d(3, 3200, kernel_size=(14, 14), stride=(14, 14))
    (norm): Identity()
  )
  (pos_drop): Identity()
  (blocks): ModuleList(
    (0): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (1): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (2): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (3): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (4): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (5): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (6): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (7): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (8): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (9): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (10): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (11): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (12): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (13): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (14): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (15): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (16): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (17): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (18): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (19): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (20): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (21): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (22): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (23): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (24): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (25): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (26): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (27): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (28): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (29): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (30): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (31): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (32): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (33): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (34): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (35): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (36): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (37): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (38): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (39): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (40): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (41): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (42): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (43): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (44): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (45): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (46): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
    (47): Block(
      (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=3200, out_features=9600, bias=False)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=3200, out_features=3200, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
        (inner_attn): FlashAttention()
        (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=3200, out_features=12800, bias=True)
        (act): GELU(approximate='none')
        (drop1): Dropout(p=0.0, inplace=False)
        (fc2): Linear(in_features=12800, out_features=3200, bias=True)
        (drop2): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
  )
  (clip_projector): AttentionPoolingBlock(
    (norm1_q): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (norm1_k): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (norm1_v): LayerNorm((3200,), eps=1e-05, elementwise_affine=True)
    (cross_attn): CrossAttention(
      (q): Linear(in_features=3200, out_features=3200, bias=False)
      (k): Linear(in_features=3200, out_features=3200, bias=False)
      (v): Linear(in_features=3200, out_features=3200, bias=False)
      (attn_drop): Dropout(p=0.0, inplace=False)
      (proj): Linear(in_features=3200, out_features=768, bias=True)
      (proj_drop): Dropout(p=0.0, inplace=False)
    )
    (drop_path): Identity()
  )
  (norm): SyncBatchNorm(6400, eps=1e-06, momentum=0.1, affine=True, track_running_stats=True)
  (head): Linear(in_features=6400, out_features=1000, bias=True)
)
[2023-11-10 11:27:33 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 212): INFO Using native Torch AMP. Training in mixed precision.
[2023-11-10 11:27:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 232): INFO number of params: 6413800
[2023-11-10 11:27:38 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 58): INFO ==============> Resuming form work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth....................
[2023-11-10 11:29:00 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 90): INFO <All keys matched successfully>
[2023-11-10 11:29:18 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 12.439 (12.439)	Loss 0.5898 (0.5898)	Acc@1 88.281 (88.281)	Acc@5 98.730 (98.730)	Mem 25669MB
[2023-11-10 11:29:40 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.240 (3.155)	Loss 0.6816 (0.5884)	Acc@1 86.230 (88.459)	Acc@5 98.047 (98.420)	Mem 25669MB
[2023-11-10 11:30:02 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.249 (2.723)	Loss 0.5234 (0.5901)	Acc@1 90.137 (88.342)	Acc@5 98.730 (98.438)	Mem 25669MB
[2023-11-10 11:30:25 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.250 (2.571)	Loss 0.6504 (0.5953)	Acc@1 86.426 (88.193)	Acc@5 98.145 (98.466)	Mem 25669MB
[2023-11-10 11:30:47 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.250 (2.493)	Loss 0.6567 (0.5976)	Acc@1 86.816 (88.207)	Acc@5 97.852 (98.452)	Mem 25669MB
[2023-11-10 11:31:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 579): INFO  * Acc@1 88.234 Acc@5 98.466
[2023-11-10 11:31:05 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 276): INFO Accuracy of the network on the 50000 test images: 88.2%
[2023-11-10 11:31:06 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 22): INFO ==============> Resuming form work_dirs/intern_vit_6b_1k_224_cls_patch_sgd_lr0.1/ckpt_epoch_ema_best.pth....................
[2023-11-10 11:32:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 42): INFO <All keys matched successfully>
[2023-11-10 11:32:28 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (utils.py 43): INFO Loaded state_dict_ema
[2023-11-10 11:32:41 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [0/49]	Time 5.421 (5.421)	Loss 0.5981 (0.5981)	Acc@1 88.379 (88.379)	Acc@5 98.730 (98.730)	Mem 48434MB
[2023-11-10 11:33:04 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [10/49]	Time 2.245 (2.526)	Loss 0.6938 (0.5984)	Acc@1 86.328 (88.485)	Acc@5 97.852 (98.411)	Mem 48434MB
[2023-11-10 11:33:26 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [20/49]	Time 2.251 (2.395)	Loss 0.5332 (0.6000)	Acc@1 90.039 (88.360)	Acc@5 98.730 (98.442)	Mem 48434MB
[2023-11-10 11:33:49 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [30/49]	Time 2.251 (2.349)	Loss 0.6592 (0.6051)	Acc@1 86.621 (88.202)	Acc@5 98.145 (98.475)	Mem 48434MB
[2023-11-10 11:34:11 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 568): INFO Test: [40/49]	Time 2.252 (2.326)	Loss 0.6655 (0.6075)	Acc@1 86.914 (88.191)	Acc@5 97.852 (98.454)	Mem 48434MB
[2023-11-10 11:34:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 579): INFO  * Acc@1 88.228 Acc@5 98.472
[2023-11-10 11:34:29 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 296): INFO Accuracy of the ema network on the 50000 test images: 88.2%
