fairseq/models/transformer.py · ee8bcb17144773b78366858b707ac5421dc2830d · OpenDAS / Fairseq

Sergey Edunov authored Jun 10, 2019

Summary:
Multi-Head attention is currently not TPU-friendly, specifically .data_ptr() is not supported and should not be used. Also there are potential issues with correctness of existing code (e.g. data_ptr() can point to the same storage for different tensors). Rather than rely on data_ptr() we should explicitly set self_attention or encoder_decoder_attention flags.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/636

Reviewed By: myleott

Differential Revision: D15709898

Pulled By: edunov

fbshipit-source-id: f931713193c51be848a5de20da730ac3a3ce0187

ee8bcb17

transformer.py 35.3 KB

Replace transformer.py