• kylasa's avatar
    [DistDGL][Mem_Optimizations]Edge Ownership processes are computed on the fly when required. (#5225) · e25f47de
    kylasa authored
    * Edge Ownership processes are computed on the fly when required.
    
    Earlier we were storing Edge ownership processes after the dataset was retrieved from the disk. For massively large datasets, each node can handle upto 5 Billion edges, this means storing owner process-ids will consume 5 * 8 = 40GB. This memory will be hanging around until the edges are exchanged.
    
    To reduce the memory footprint of the pipeline, we no longer store the ownership process-ids in the 'edge_data' dictionary after reading the dataset from the disk. Instead, we compute them on the fly at the time of exchanging edges.
    
    Another optimization is not to send/receive all the messages in a one single large message. Instead we now split the total number edges into chunks, limited by 8 GB per node. And we iterate until all the chunks are exchanged.
    
    Once all the edges are exchanged, as a sanity check, we compute the total number of edges in the system and compare it with the original value before edge shuffling, in a final assert statement before return the result to the caller.
    
    * Applying lintrunner patch.
    e25f47de
utils.py 19.9 KB