-
Xian Li authored
Summary: This diff enables layer drop in transformer decoder in production training pipeline (ptt_transformer). It builds on top of the fairseq implementation D18094657 added by Angela Fan, and added additional logic to handle corresponding dropping layers at test time in exported model. Reviewed By: jhcross Differential Revision: D18165586 fbshipit-source-id: 373ac00268a25fa9e412edcb483becdfe792d992
856d8b82