• Jerry Ma's avatar
    Add periodic CUDA cache cleanup (#882) · 315c463d
    Jerry Ma authored
    Summary:
    This adds a periodic call to `torch.cuda.empty_cache()` in order to
    mitigate memory fragmentation in the PyTorch CUDA cached allocator
    that can cause OOMs on models approaching GPU memory limit.
    By default, this will occur every 64 updates.
    
    Performance considerations:
    
    - I've benchmarked this on a reasonably large model with memory
      footprint 16 GB, and the overhead with the default setting is <0.2%.
      With `update-freq > 1`, the cost is mitigated even further.
    - This behavior can be disabled with a value of zero.
    Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/882
    
    Differential Revision: D17742386
    
    Pulled By: jma127
    
    fbshipit-source-id: 68d8f93f798d6818b5efc3d67d43b52dfb8b2865
    315c463d
trainer.py 23.3 KB