1. 13 Jul, 2022 1 commit
    • kylasa's avatar
      Support new format for multi-file support in distributed partitioning. (#4217) · dad3606a
      kylasa authored
      * Code changes for the following
      
      1. Generating node data at each process
      2. Reading csv files using pyarrow
      3. feature complete code.
      
      * Removed some typo's because of which unit tests were failing
      
      1. Change the file name to correct file name when loading edges from file
      2. When storing node-features after shuffling, use the correct key to store the global-nids of node features which are received after transmitted.
      
      * Code changes to address CI comments by reviewers
      
      1. Removed some redundant code and added text in the doc-strings to describe the functionality of some functions.
      2 function signatures and invocations now match w.r.t argument list
      3. Added detailed description of the metadata json structure so that the users understand the the type of information present in this file and how it is used through out the code.
      
      * Addressing code review comments
      
      1. Addressed all the CI comments and some of the changes include simplifying the code related to the concatenation of lists and enhancing the docstrings of functions which are changed in this process.
      
      * Update docstring's of two functions appropriately in response to code review comments
      
      Removed "todo" from the docstring of the gen_nodedata function.
      Added "todo" to the gen_dist_partitions function when node-id to partition-id's are read for the first time.
      
      Removed 'num-node-weights' from the docstring for the get_dataset function and added schema_map docstring to the argument list.
      dad3606a
  2. 05 Jul, 2022 1 commit
    • kylasa's avatar
      Added code to support multiple-file-support feature and removed singl… (#4188) · 9948ef4d
      kylasa authored
      * Added code to support multiple-file-support feature and removed single-file-support code
      
      1. Added code to read dataset in multiple-file-format
      2. Removed code for single-file format
      
      * added files missing in the previous commit
      
      This commit includes dataset_utils.py, which reads the dataset in multiple-file-format, gloo_wrapper function calls to support exchanging dictionaries as objects and helper functions in utils.py
      
      * Update convert_partition.py
      
      Updated function call "create_metadata_json" file to include partition_id so that each rank only creates its own metadata object and later on these are accumulated on rank-0 to create graph-level metadata json file.
      
      * addressing code review comments during the CI process
      
      code changes resulting from the code review comments received during the CI process.
      
      * Code reorganization
      
      Addressing CI comments and code reorganization for easier understanding.
      
      * Removed commented out line
      
      removed commented out line.
      9948ef4d