• kylasa's avatar
    [Distributed] Change for the new input format for distributed partitioning (#4273) · 7f8e1cf2
    kylasa authored
    * Code changes to address the updated file format support for massively large graphs.
    
    1. Updated the docstring for the starting function 'gen_dist_partitions" to describe the newly proposed file format for input dataset.
    2. Code which was dependent on the structure of the old-metadata json object has been updated to read from the newly proposed metadata file.
    3. Fixed some errors when appropriate functions were invoked and the calling function expects return values from the invoked furnction.
    4. This modified code has been tested on "mag" dataset using 4-way partitions and verified the results
    
    * Code changes to address the CI review comments
    
    1. Improved docstrings for some functions.
    2. Added a new function in the utils.py to compute the id ranges and this is used in multiple places.
    
    * Added TODO to indicate the redundant data structure.
    
    Because of the new file format changes, one of the dictionaries (node_feature_tids, node_tids) will be redundant. Added TODO text so that this will be removed in the next iteration of code changes.
    7f8e1cf2
utils.py 13.5 KB