tools/distpartitioning/convert_partition.py · dad3606ab68243afeb8065d46bbf6e8bbc89dfb7 · OpenDAS / dgl

Support new format for multi-file support in distributed partitioning. (#4217) · dad3606a

kylasa authored Jul 12, 2022

* Code changes for the following

1. Generating node data at each process
2. Reading csv files using pyarrow
3. feature complete code.

* Removed some typo's because of which unit tests were failing

1. Change the file name to correct file name when loading edges from file
2. When storing node-features after shuffling, use the correct key to store the global-nids of node features which are received after transmitted.

* Code changes to address CI comments by reviewers

1. Removed some redundant code and added text in the doc-strings to describe the functionality of some functions.
2 function signatures and invocations now match w.r.t argument list
3. Added detailed description of the metadata json structure so that the users understand the the type of information present in this file and how it is used through out the code.

* Addressing code review comments

1. Addressed all the CI comments and some of the changes include simplifying the code related to the concatenation of lists and enhancing the docstrings of functions which are changed in this process.

* Update docstring's of two functions appropriately in response to code review comments

Removed "todo" from the docstring of the gen_nodedata function.
Added "todo" to the gen_dist_partitions function when node-id to partition-id's are read for the first time.

Removed 'num-node-weights' from the docstring for the get_dataset function and added schema_map docstring to the argument list.

dad3606a

convert_partition.py 13.1 KB

Replace convert_partition.py