1. 06 Oct, 2022 1 commit
    • Alara Dirik's avatar
      馃毃 馃毃 馃毃 Fix ViT parameter initialization (#19341) · f0b49015
      Alara Dirik authored
      This PR aims to rectify the discrepancy between the training performances of HF and Timm ViT implementations.
      
      - Initializes torch and flax ViT dense layer weights with trunc_normal instead of normal (consistent with the TF implementation.
      - Initializes cls_token and positional_embeddings with trunc_normal
      - Updates DeiT copy to reflect the changes
      f0b49015
  2. 05 Oct, 2022 17 commits
  3. 04 Oct, 2022 18 commits
  4. 03 Oct, 2022 4 commits