gradient checkpointing for GPT-NeoX (#19946)
* gradient checkpointing for GPT-NeoX * initialize gradient checkpointing flag * must set flag before init
Showing
Please register or sign in to comment
* gradient checkpointing for GPT-NeoX * initialize gradient checkpointing flag * must set flag before init