Add periodic CUDA cache cleanup (#882)

Summary: This adds a periodic call to `torch.cuda.empty_cache()` in order to mitigate memory fragmentation in the PyTorch CUDA cached allocator that can cause OOMs on models approaching GPU memory limit. By default, this will occur every 64 updates. Performance considerations: - I've benchmarked this on a reasonably large model with memory footprint 16 GB, and the overhead with the default setting is <0.2%. With `update-freq > 1`, the cost is mitigated even further. - This behavior can be disabled with a value of zero. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/882 Differential Revision: D17742386 Pulled By: jma127 fbshipit-source-id: 68d8f93f798d6818b5efc3d67d43b52dfb8b2865

Add periodic CUDA cache cleanup (#882)
Summary: This adds a periodic call to `torch.cuda.empty_cache()` in order to mitigate memory fragmentation in the PyTorch CUDA cached allocator that can cause OOMs on models approaching GPU memory limit. By default, this will occur every 64 updates. Performance considerations: - I've benchmarked this on a reasonably large model with memory footprint 16 GB, and the overhead with the default setting is <0.2%. With `update-freq > 1`, the cost is mitigated even further. - This behavior can be disabled with a value of zero. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/882 Differential Revision: D17742386 Pulled By: jma127 fbshipit-source-id: 68d8f93f798d6818b5efc3d67d43b52dfb8b2865
315c463d · Jerry Ma · Facebook Github Bot · de348d1f · 315c463d · 315c463d
Commit 315c463d authored Oct 04, 2019 by Jerry Ma Committed by Facebook Github Bot Oct 04, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 0 deletions

fairseq/options.py fairseq/options.py +2 -0

fairseq/trainer.py fairseq/trainer.py +8 -0

No files found.
--- a/fairseq/options.py
+++ b/fairseq/options.py
@@ -193,6 +193,8 @@ def get_parser(desc, default_task='translation'):
                        help='threshold FP16 loss scale from below')
    parser.add_argument('--user-dir', default=None,
                        help='path to a python module containing custom extensions (tasks and/or architectures)')
+    parser.add_argument('--empty-cache-freq', default=0, type=int,
+                        help='how often to clear the PyTorch CUDA cache (0 to disable)')
    from fairseq.registry import REGISTRIES
    for registry_name, REGISTRY in REGISTRIES.items():

--- a/fairseq/trainer.py
+++ b/fairseq/trainer.py
@@ -426,6 +426,14 @@ class Trainer(object):
            if 'nll_loss' in logging_output:
                self.meters['train_nll_loss'].update(logging_output.get('nll_loss', 0), ntokens)
+            # clear CUDA cache to reduce memory fragmentation
+            if (self.args.empty_cache_freq > 0 and
+                    ((self.get_num_updates() + self.args.empty_cache_freq - 1) %
+                     self.args.empty_cache_freq) == 0 and
+                    torch.cuda.is_available() and
+                    not self.args.cpu):
+                torch.cuda.empty_cache()
        except OverflowError as e:
            print('| WARNING: overflow detected, ' + str(e))
            self.zero_grad()