"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "c49ce3c722c35324803e40efb88b1a3057c7f249"
Fix gradient overflow issue during attention mask
This fix is in reference to issue #382. GPT2 can now be trained in mixed precision, which I've confirmed with testing. I also tested unconditional generation on multiple seeds before and after changing 1e10 to 1e4 and there was no difference. Please let me know if there is anything else I can do to make this pull request better. Thanks for all your work!
Showing
Please register or sign in to comment