-
Paul Fultz II authored
Improves performance for add_gelu. In bert it is 4x faster and for mul_add it is 50% faster than what we current have.
ddbbe54b
Improves performance for add_gelu. In bert it is 4x faster and for mul_add it is 50% faster than what we current have.