🚨 🚨 🚨 Fix ViT parameter initialization (#19341)
This PR aims to rectify the discrepancy between the training performances of HF and Timm ViT implementations. - Initializes torch and flax ViT dense layer weights with trunc_normal instead of normal (consistent with the TF implementation. - Initializes cls_token and positional_embeddings with trunc_normal - Updates DeiT copy to reflect the changes
Showing
Please register or sign in to comment