"git@developer.sourcefind.cn:gaoqiong/flash-attention.git" did not exist on "496e4f528c647aa29fb5fb8e8e989078ebe7ef6c"
Added Mish Activation Function
Mish is a new activation function proposed here - https://arxiv.org/abs/1908.08681 It has seen some recent success and has been adopted in SpaCy, Thic, TensorFlow Addons and FastAI-dev. All benchmarks recorded till now (including against ReLU, Swish and GELU) is present in the repository - https://github.com/digantamisra98/Mish Might be a good addition to experiment with especially in the Bert Model.
Showing
Please register or sign in to comment