Fix fp16 Transformer model. (#7402)
Also, do Transformer inference in fp16, as well as training, when --dtype=fp16. In TF 2, layers now cannot run in multiple different dtypes, so we must use the same dtype for training and inference.
Showing
Please register or sign in to comment