HF <-> megatron checkpoint reshaping and conversion for GPT (#19317)
* HF <-> megatron checkpoint conversion handling reshaping from different tensor and parallel sizes * Apply suggestions from code review Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * addressing comments * add doc strings and
🐛 fixes Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Showing
This diff is collapsed.
Please register or sign in to comment