Add LayerScale to NAT/DiNAT (#20325)
* Add LayerScale to NAT/DiNAT.
Completely dropped the ball on LayerScale in the original PR (#20219).
This is just an optional argument in both models, and is only activated for larger variants in order to provide training stability.
* Add LayerScale to NAT/DiNAT.
Minor error fixed.
Co-authored-by:
Ali Hassani <ahassanijr@gmail.com>
Showing
Please register or sign in to comment