• Xian Li's avatar
    layer drop · 856d8b82
    Xian Li authored
    Summary: This diff enables layer drop in transformer decoder in production training pipeline (ptt_transformer). It builds on top of the fairseq implementation D18094657 added by Angela Fan, and added additional logic to handle corresponding dropping layers at test time in exported model.
    
    Reviewed By: jhcross
    
    Differential Revision: D18165586
    
    fbshipit-source-id: 373ac00268a25fa9e412edcb483becdfe792d992
    856d8b82
checkpoint_utils.py 16.7 KB