Add a use_parallel_residual argument to control the residual computing way (#18695)
* Add a gpt_j_residual argument to control the residual computing way * Put duplicate code outside of the if block * Rename parameter "gpt_j_residual" to "use_parallel_residual" and set the default value to True
Showing
Please register or sign in to comment