"tools/convert_checkpoint/deepspeed_to_deepspeed.py" did not exist on "8ec5d6780e546aaa6338b2d3271f291d4ecc3127"
Conditionally enable GeLU approximation (#1810)
Sigmoid approximation for GeLU was introduced in #1299 for Fp16. The sigmoid approximation is known to get better perf but lower accuracy. https://arxiv.org/pdf/1606.08415.pdf
Showing
Please register or sign in to comment