pytorch_pretrained_bert/modeling_gpt2.py · 2a329c61868b20faee115a78bdcaf660ff74cf41 · chenpangpang / transformers

Fix gradient overflow issue during attention mask · 9e666aaa

Abhi Sharma authored Apr 16, 2019

This fix is in reference to issue #382. GPT2 can now be trained in mixed precision, which I've confirmed with testing. I also tested unconditional generation on multiple seeds before and after changing 1e10 to 1e4 and there was no difference. Please let me know if there is anything else I can do to make this pull request better. Thanks for all your work!

9e666aaa

modeling_gpt2.py 31 KB

Replace modeling_gpt2.py