Fix bug in gpt2's (from-scratch) special scaled weight initialization (#17877)
* only special scale init each gpt2 c_proj weight once, on exact match
* fix double quotes
Co-authored-by:
leandro <leandro.vonwerra@spoud.io>
Showing
Please register or sign in to comment