"tests/models/bert/test_modeling_bert.py" did not exist on "fda703a55374b3caaf4e886016f7de5810fa3571"
🚨 🚨 🚨 Fix ViT parameter initialization (#19341)
This PR aims to rectify the discrepancy between the training performances of HF and Timm ViT implementations. - Initializes torch and flax ViT dense layer weights with trunc_normal instead of normal (consistent with the TF implementation. - Initializes cls_token and positional_embeddings with trunc_normal - Updates DeiT copy to reflect the changes
Showing
Please register or sign in to comment