Update: ignore padding support for TransfoXL training when n_clusters==0 (#22457)
* Update: ignore padding support for TransfoXL training when n_clusters==0 * Update: transformer XL always pad * Update: drop doc
Showing
Please register or sign in to comment