Added block diagonal feedforward layer.
This layer replaces the weight matrix of the output_dense layer with a block diagonal matrix to save layer parameters and FLOPs. A linear mixing layer can be added optionally to improve layer expressibility. PiperOrigin-RevId: 418828099
Showing
Please register or sign in to comment