Add a GPU memory snapshot profiler in d2go
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/542 ## Overview Add an option to enable GPU memory snapshot profiler in d2go. The profiler is natively supported by Pytorch and is able to record stack traces associated with all CUDA memory allocation/free events, allowing users to understand which parts of code contribute to the memory bottleneck. It also provides a powerful interactive web tool to visualize memory utilization ordered by time: {F978609840} Each colored block represents an allocated cuda memory block. User can click on the block to see the corresponding python stack trace that allocates the block. ## d2go integration This diff integrates the profiler as a hook controlled by config key `USE_MEMORY_PROFILER`. The profiler will log snapshots and web tools to the output directory. There are three places that logging could happen: start of training, during training and OOM. Please read the docstring of `D2GoGpuMemorySnapshot` for more information. Reviewed By: tglik, jaconey Differential Revision: D45673764 fbshipit-source-id: 8900484a2266d94421fe3ee7a85a4dea3a9f6b72
Showing
Please register or sign in to comment