fairseq/utils.py · 03a57decde62c76783ef7e2288bd61bc87f6e266 · OpenDAS / Fairseq

Improve memory efficiency of FP16 optimization (#404) · 03a57dec

Myle Ott authored Dec 24, 2018

Summary:
Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.

This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.

This does not affect convergence, i.e., models will train exactly as they did before.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/404

Differential Revision: D13394376

Pulled By: myleott

fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf

03a57dec

utils.py 15.4 KB

Replace utils.py