Add swiglu and squared relu activations and ability to disable bias. See merge request ADLR/megatron-lm!553
Attach a file by drag & drop or click to upload