"docs/source/vscode:/vscode.git/clone" did not exist on "f00cd6efbd00b0273f58c393a617415b5d1d410e"
[DistDGL][Mem_Optimizations]get_partition_ids, service provided by the...
[DistDGL][Mem_Optimizations]get_partition_ids, service provided by the distributed lookup service has high memory footprint (#5226) * get_partition_ids, service provided by the distributed lookup service has high memory footprint 'get_partitionid' function, which is used to retrieve owner processes of the given list of global node ids, has high memory footprint. Currently this is of the order of 8x compared to the size of the input list. For massively large datasets, this memory needs are very unrealistic and may result in OOM. In the case of CoreGraph, when retrieving owner of an edge list of size 6 Billion edges, the memory needs can be as high as 8*8*8 = 256 GB. To limit the amount of memory used by this function, we split the size of the message sent to the distributed lookup service, so that each message is limited by the number of global node ids, which is 200 million. This reduced the memory footprint of this entire function to be no more than 0.2 * 8 * 8 = 13 GB. which is within reasonable limits. Now since we send multiple small messages compared to one large message to the distributed lookup service, this may consume more wall-clock-time compared to earlier implementation. * lintrunner patch. * using np.ceil() per suggestion. * converting the output of np.ceil() as ints.
Showing
Please register or sign in to comment