1. 06 Mar, 2023 1 commit
    • kylasa's avatar
      Support for no. of chunks smaller than no. of partitions. (#5390) · 894ad1e3
      kylasa authored
      * Support for no. of chunks smaller than no. of partitions and Adding appropriate test cases.
      
      Following changes are made with this PR.
      1. Code changes for handling no. of chunks smaller than no. of partitions
      2. Adding new test cases, which were previously deleted, for no. of chunks smaller than no. of partitions.
      3. Also adding test cases, where multiple partitions are handled by a single process.
      
      * Committing the missing files in this commit.
      
      * lintrunner patch.
      
      * lintrunner check
      
      * lintrunner patch here.
      
      * CI review comments.
      894ad1e3
  2. 25 Feb, 2023 1 commit
    • kylasa's avatar
      [DistDGL][Feature_Request]Changes in the metadata.json file for input graph dataset. (#5310) · a14f69c9
      kylasa authored
      * Implemented the following changes.
      
      * Remove NUM_NODES_PER_CHUNK
      * Remove NUM_EDGES_PER_CHUNK
      * Remove the dependency between no. of edge files per edge type and no. of partitions
      * Remove the dependency between no. of edge feature files per edge type and no. of partitions
      * Remove the dependency between no. of edge feature files and no. of edge files per edge type.
      * Remove the dependency between no. of node feature files and no. of partitions
      * Add “node_type_counts”. This will be a list of integers. Each integer will represent total count of a node-type. The index in this list and the index in the “node_type” will be the same for a given node-type.
      * Add “edge_type_counts”. This will be a list of integers. Each integer will represent total count of an edge-type. The index in this list and the index in the “edge_type” list will be the same for a given edge-type.
      
      * Applying lintrunner patch.
      
      * Adding missing keys to the metadata in the unit test framework.
      
      * lintrunner patch.
      
      * Resolving CI test failures due to merge conflicts.
      
      * Applying lintrunner patch
      
      * applying lintrunner patch
      
      * Replacing tabspace with spaces - to satisfy lintrunner
      
      * Fixing the CI Test Failure cases.
      
      * Applying lintrunner patch
      
      * lintrunner complaining about a blank line.
      
      * Resolving issues with print statement for NoneType
      
      * Removed tests for the arbitrary chunks tests. Since this functionality is not supported anymore.
      
      * Addressing CI review comments.
      
      * addressing CI review comments
      
      * lintrunner patch
      
      * lintrunner patch.
      
      * Addressing CI review comments.
      
      * lintrunner patch.
      a14f69c9
  3. 13 Feb, 2023 1 commit
    • kylasa's avatar
      Code changes to fix order sensitivity of the pipeline (#5288) · 432c71ef
      kylasa authored
      
      
      Following changes are made in this PR.
      1. In dataset_utils.py, when reading edges from disk we follow the order defined by the STR_EDGE_TYPE key in the metadata.json file. This order is implicitly used to assign edgeid to edge types. This same order is used to read edges from the disk as well.
      2. Now the unit test framework will also randomize the order of edges read from the disk. This is done for the edges when reading from the disk for the unit tests.
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      432c71ef
  4. 03 Feb, 2023 1 commit
  5. 03 Jan, 2023 1 commit
  6. 15 Dec, 2022 1 commit
    • Rhett Ying's avatar
      [Dist] enable to chunk node/edge data into arbitrary number of chunks (#4930) · 9731e023
      Rhett Ying authored
      
      
      * [Dist] enable to chunk node/edge data into arbitrary number of chunks
      
      * [Dist] enable to split node/edge data into arbitrary parts
      
      * refine code
      
      * Format boolean to uint8 forcely to avoid dist.scatter failure
      
      * convert boolean to int8 before scatter and revert it after scatter
      
      * refine code
      
      * fix test
      
      * refine code
      
      * move test utilities into utils.py
      
      * update comment
      
      * fix empty data
      
      * update
      
      * update
      
      * fix empty data issue
      
      * release unnecessary mem
      
      * release unnecessary mem
      
      * release unnecessary mem
      
      * release unnecessary mem
      
      * release unnecessary mem
      
      * remove unnecessary shuffle data
      
      * separate array_split into standalone utility
      
      * add example
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      9731e023
  7. 28 Nov, 2022 1 commit
  8. 18 Nov, 2022 1 commit
    • kylasa's avatar
      [Dist] Flexible pipeline - Initial commit (#4733) · c8ea9fa4
      kylasa authored
      * Flexible pipeline - Initial commit
      
      1. Implementation of flexible pipeline feature.
      2. With this implementation, the pipeline now supports multiple partitions per process. And also assumes that num_partitions is always a multiple of num_processes.
      
      * Update test_dist_part.py
      
      * Code changes to address review comments
      
      * Code refactoring of exchange_features function into two functions for better readability
      
      * Upadting test_dist_part to fix merge issues with the master branch
      
      * corrected variable names...
      
      * Fixed code refactoring issues.
      
      * Provide missing function arguments to exchange_feature function
      
      * Providing the missing function argument to fix error.
      
      * Provide missing function argument to 'get_shuffle_nids' function.
      
      * Repositioned a variable within its scope.
      
      * Removed tab space which is causing the indentation problem
      
      * Fix issue with the CI test framework, which is the root cause for the failure of the CI tests.
      
      1. Now we read files specific to the partition-id and store this data separately, identified by the local_part_id, in the local process.
      2. Similarly as above, we also differentiate the node and edge features type_ids with the same keys as above.
      3. These above two changes will help up to get the appropriate feature data during the feature exchange and send to the destination process correctly.
      
      * Correct the parametrization for the CI unit test cases.
      
      * Addressing Rui's code review comments.
      
      * Addressing code review comments.
      c8ea9fa4
  9. 07 Nov, 2022 1 commit
  10. 19 Oct, 2022 1 commit
  11. 03 Oct, 2022 1 commit
    • kylasa's avatar
      Edge Feature support for input graph datasets for dist. graph partitioning pipeline (#4623) · 1f471396
      kylasa authored
      * Added support for edge features.
      
      * Added comments and removing unnecessary print statements.
      
      * updated data_shuffle.py to remove compile error.
      
      * Repaled python3 with python to match CI test framework.
      
      * Removed unrelated files from the pull request.
      
      * Isort changes.
      
      * black changes on this file.
      
      * Addressing CI review comments.
      
      * Addressing CI comments.
      
      * Removed duplicated and resolved merge conflict code.
      
      * Addressing CI Comments from Rui.
      
      * Addressing CI comments, and fixing merge issues.
      
      * Addressing CI comments, code refactoring, isort and black
      1f471396
  12. 28 Sep, 2022 1 commit
  13. 23 Sep, 2022 1 commit
    • kylasa's avatar
      Garbage Collection and memory snapshot code for debugging partitioning... · ace76327
      kylasa authored
      
       Garbage Collection and memory snapshot code for debugging partitioning pipeline (target as master branch) (#4598)
      
      * Squashed commit of the following:
      
      commit e605a550b3783dd5f24eb39b6873a2e0e79be9c7
      Author: kylasa <kylasa@gmail.com>
      Date:   Thu Sep 15 14:45:39 2022 -0700
      
          Delete pyproject.toml
      
      commit f2db9e700d817212b67b5227f6472d218f0c74f2
      Author: kylasa <kylasa@gmail.com>
      Date:   Thu Sep 15 14:44:40 2022 -0700
      
          Changes suggested by isort program to sort imports.
      
      commit 5a6078beac6218a4f1fb378c169f04dda7396425
      Author: kylasa <kylasa@gmail.com>
      Date:   Thu Sep 15 14:39:50 2022 -0700
      
          addressing code review comments from the CI process.
      
      commit c8e92decb7aebeb32c7467108e16f058491443ab
      Author: kylasa <kylasa@gmail.com>
      Date:   Wed Sep 14 18:23:59 2022 -0700
      
          Corrected a typo in the import statement
      
      commit 14ddb0e9b553d5be3ed2c50d82dee671e84ad8c9
      Author: kylasa <kylasa@gmail.com>
      Date:   Tue Sep 13 18:47:34 2022 -0700
      
          Memory snapshot code for debugging memory footprint of the graph partitioning pipeline
      
      Squashed commit done
      
      * Addressing code review comments.
      
      * Update utils.py
      
      * dummy change to trigger CI tests
      Co-authored-by: default avatarRhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
      ace76327
  14. 12 Aug, 2022 1 commit
  15. 11 Aug, 2022 2 commits
    • Minjie Wang's avatar
      [Dist] New distributed data preparation pipeline (#4386) · 71ce1749
      Minjie Wang authored
      * code changes for bug fixes identified during mag_lsc dataset (#4187)
      
      * code changes for bug fixes identified during mag_lsc dataset
      
      1. Changed from call torch.Tensor() to torch.from_numpy() to address memory corruption issues when creating large tensors. Tricky thing is this works correctly for small tensors.
      2. Changed dgl.graph() function call to include 'num_nodes" argument to specifically mention all the nodes in  a graph partition.
      
      * Update convert_partition.py
      
      Moving the changes to the function "create_metadata_json" function to the "multiple-file-format" support, where this change is more appropriate. Since multiple machine testing was done with these code changes.
      
      * Addressing review comments.
      
      Removed space as suggested at the end of the line
      
      * Revert "Revert "[Distributed Training Pipeline] Initial implementation of Distributed data processing step in the Dis… (#3926)" (#4037)"
      
      This reverts commit 7c598aac
      
      .
      
      * Added code to support multiple-file-support feature and removed singl… (#4188)
      
      * Added code to support multiple-file-support feature and removed single-file-support code
      
      1. Added code to read dataset in multiple-file-format
      2. Removed code for single-file format
      
      * added files missing in the previous commit
      
      This commit includes dataset_utils.py, which reads the dataset in multiple-file-format, gloo_wrapper function calls to support exchanging dictionaries as objects and helper functions in utils.py
      
      * Update convert_partition.py
      
      Updated function call "create_metadata_json" file to include partition_id so that each rank only creates its own metadata object and later on these are accumulated on rank-0 to create graph-level metadata json file.
      
      * addressing code review comments during the CI process
      
      code changes resulting from the code review comments received during the CI process.
      
      * Code reorganization
      
      Addressing CI comments and code reorganization for easier understanding.
      
      * Removed commented out line
      
      removed commented out line.
      
      * Support new format for multi-file support in distributed partitioning. (#4217)
      
      * Code changes for the following
      
      1. Generating node data at each process
      2. Reading csv files using pyarrow
      3. feature complete code.
      
      * Removed some typo's because of which unit tests were failing
      
      1. Change the file name to correct file name when loading edges from file
      2. When storing node-features after shuffling, use the correct key to store the global-nids of node features which are received after transmitted.
      
      * Code changes to address CI comments by reviewers
      
      1. Removed some redundant code and added text in the doc-strings to describe the functionality of some functions.
      2 function signatures and invocations now match w.r.t argument list
      3. Added detailed description of the metadata json structure so that the users understand the the type of information present in this file and how it is used through out the code.
      
      * Addressing code review comments
      
      1. Addressed all the CI comments and some of the changes include simplifying the code related to the concatenation of lists and enhancing the docstrings of functions which are changed in this process.
      
      * Update docstring's of two functions appropriately in response to code review comments
      
      Removed "todo" from the docstring of the gen_nodedata function.
      Added "todo" to the gen_dist_partitions function when node-id to partition-id's are read for the first time.
      
      Removed 'num-node-weights' from the docstring for the get_dataset function and added schema_map docstring to the argument list.
      
      * [Distributed] Change for the new input format for distributed partitioning (#4273)
      
      * Code changes to address the updated file format support for massively large graphs.
      
      1. Updated the docstring for the starting function 'gen_dist_partitions" to describe the newly proposed file format for input dataset.
      2. Code which was dependent on the structure of the old-metadata json object has been updated to read from the newly proposed metadata file.
      3. Fixed some errors when appropriate functions were invoked and the calling function expects return values from the invoked furnction.
      4. This modified code has been tested on "mag" dataset using 4-way partitions and verified the results
      
      * Code changes to address the CI review comments
      
      1. Improved docstrings for some functions.
      2. Added a new function in the utils.py to compute the id ranges and this is used in multiple places.
      
      * Added TODO to indicate the redundant data structure.
      
      Because of the new file format changes, one of the dictionaries (node_feature_tids, node_tids) will be redundant. Added TODO text so that this will be removed in the next iteration of code changes.
      
      * [Distributed] use alltoall fix to bypass gloo - alltoallv bug in distributed partitioning (#4311)
      
      * Alltoall Fix to bypass gloo - alltoallv bug which is preventing further testing
      
      1. Replaced alltoallv gloo wrapper call with alltoall message.
      2. All the messages are padded to be of same length
      3. Receiving side unpads the messages and continues processing.
      
      * Code changes to address CI comments
      
      1. Removed unused functions from gloo_wrapper.py
      2. Changed the function signature of alltoallv_cpu_data as suggested.
      3. Added docstring to include more description of the functionality inside alltoallv_cpu_data. Included more asserts to validate the assumptions.
      
      * Changed the function name appropriately
      
      Changed the function name from "alltoallv_cpu_data" to alltoallv_cpu which I believe is appropriate because underlying functionality is providing alltoallv which is basically alltoall_cpu + padding
      
      * Added code and text to address the review comments.
      
      1. Changed the function name to indicate the local use of this function.
      2. Changed docstring to indicate the assumptions made by alltoallv_cpu function.
      
      * Removed unused function from import statement
      
      Removed unused/removed function from import statement.
      
      * [Distributed] reduce memory consumption in distributed graph partitioning. (#4338)
      
      * Fix for node_subgraph function, which seems to generate segmentation fault for very large partitions
      
      1. Removed three graph dgl objects and we create the final dgl object directly by maintaining the following constraints
      a) nodes are reordered so that local nodes are placed in the beginning of the nodes list compared to non-local nodes.
      b)Edges order are maintained as passed into this function.
      c) src/dst end points are mapped to target values based on the reshuffle'd nodes order.
      
      * Code changes addressing CI comments for this PR
      
      1. Used Da's suggested map to map nodes from old to new order.
      This is much simpler and mem. efficient.
      
      * Addressing CI Comments.
      
      1. Reduced the amount of documentation to reflect the actual implementation.
      2. named the mapping object appropriately.
      
      * [Distributed] Graph chunking UX (#4365)
      
      * first commit
      
      * update
      
      * huh
      
      * fix
      
      * update
      
      * revert core
      
      * fix
      
      * update
      
      * rewrite
      
      * oops
      
      * address comments
      
      * add graph name
      
      * address comments
      
      * remove sample metadata file
      
      * address comments
      
      * fix
      
      * remove
      
      * add docs
      
      * Adding launch script and wrapper script to trigger distributed graph … (#4276)
      
      * Adding launch script and wrapper script to trigger distributed graph partitioning pipeline as defined in the UX document
      
      1. dispatch_data.py is a wrapper script which builds the command and triggers the distributed partitioning pipeline
      2. distgraphlaunch.py is the main python script which triggers the pipeline and to simplify its usage dispatch_data.py is included as a wrapper script around it.
      
      * Added code to auto-detect python version and retrieve some parameters from the input metadata json file
      
      1. Auto detect python version
      2. Read the metadata json file and extract some parameters to pass to the user defined command which is used to trigger the pipeline.
      
      * Updated the json file name to metadata.json file per UX documentation
      
      1. Renamed json file name per UX documentation.
      
      * address comments
      
      * fix
      
      * fix doc
      
      * use unbuffered logging to cure anxiety
      
      * cure more anxiety
      
      * Update tools/dispatch_data.py
      Co-authored-by: default avatarMinjie Wang <minjie.wang@nyu.edu>
      
      * oops
      Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>
      Co-authored-by: default avatarMinjie Wang <minjie.wang@nyu.edu>
      Co-authored-by: default avatarkylasa <kylasa@gmail.com>
      Co-authored-by: default avatarDa Zheng <zhengda1936@gmail.com>
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      71ce1749
    • kylasa's avatar
      Adding launch script and wrapper script to trigger distributed graph … (#4276) · 8086d1ed
      kylasa authored
      
      
      * Adding launch script and wrapper script to trigger distributed graph partitioning pipeline as defined in the UX document
      
      1. dispatch_data.py is a wrapper script which builds the command and triggers the distributed partitioning pipeline
      2. distgraphlaunch.py is the main python script which triggers the pipeline and to simplify its usage dispatch_data.py is included as a wrapper script around it.
      
      * Added code to auto-detect python version and retrieve some parameters from the input metadata json file
      
      1. Auto detect python version
      2. Read the metadata json file and extract some parameters to pass to the user defined command which is used to trigger the pipeline.
      
      * Updated the json file name to metadata.json file per UX documentation
      
      1. Renamed json file name per UX documentation.
      
      * address comments
      
      * fix
      
      * fix doc
      
      * use unbuffered logging to cure anxiety
      
      * cure more anxiety
      
      * Update tools/dispatch_data.py
      Co-authored-by: default avatarMinjie Wang <minjie.wang@nyu.edu>
      
      * oops
      Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>
      Co-authored-by: default avatarMinjie Wang <minjie.wang@nyu.edu>
      8086d1ed
  16. 23 Jul, 2022 1 commit
    • kylasa's avatar
      [Distributed] Change for the new input format for distributed partitioning (#4273) · 7f8e1cf2
      kylasa authored
      * Code changes to address the updated file format support for massively large graphs.
      
      1. Updated the docstring for the starting function 'gen_dist_partitions" to describe the newly proposed file format for input dataset.
      2. Code which was dependent on the structure of the old-metadata json object has been updated to read from the newly proposed metadata file.
      3. Fixed some errors when appropriate functions were invoked and the calling function expects return values from the invoked furnction.
      4. This modified code has been tested on "mag" dataset using 4-way partitions and verified the results
      
      * Code changes to address the CI review comments
      
      1. Improved docstrings for some functions.
      2. Added a new function in the utils.py to compute the id ranges and this is used in multiple places.
      
      * Added TODO to indicate the redundant data structure.
      
      Because of the new file format changes, one of the dictionaries (node_feature_tids, node_tids) will be redundant. Added TODO text so that this will be removed in the next iteration of code changes.
      7f8e1cf2
  17. 13 Jul, 2022 1 commit
    • kylasa's avatar
      Support new format for multi-file support in distributed partitioning. (#4217) · dad3606a
      kylasa authored
      * Code changes for the following
      
      1. Generating node data at each process
      2. Reading csv files using pyarrow
      3. feature complete code.
      
      * Removed some typo's because of which unit tests were failing
      
      1. Change the file name to correct file name when loading edges from file
      2. When storing node-features after shuffling, use the correct key to store the global-nids of node features which are received after transmitted.
      
      * Code changes to address CI comments by reviewers
      
      1. Removed some redundant code and added text in the doc-strings to describe the functionality of some functions.
      2 function signatures and invocations now match w.r.t argument list
      3. Added detailed description of the metadata json structure so that the users understand the the type of information present in this file and how it is used through out the code.
      
      * Addressing code review comments
      
      1. Addressed all the CI comments and some of the changes include simplifying the code related to the concatenation of lists and enhancing the docstrings of functions which are changed in this process.
      
      * Update docstring's of two functions appropriately in response to code review comments
      
      Removed "todo" from the docstring of the gen_nodedata function.
      Added "todo" to the gen_dist_partitions function when node-id to partition-id's are read for the first time.
      
      Removed 'num-node-weights' from the docstring for the get_dataset function and added schema_map docstring to the argument list.
      dad3606a
  18. 05 Jul, 2022 1 commit
    • kylasa's avatar
      Added code to support multiple-file-support feature and removed singl… (#4188) · 9948ef4d
      kylasa authored
      * Added code to support multiple-file-support feature and removed single-file-support code
      
      1. Added code to read dataset in multiple-file-format
      2. Removed code for single-file format
      
      * added files missing in the previous commit
      
      This commit includes dataset_utils.py, which reads the dataset in multiple-file-format, gloo_wrapper function calls to support exchanging dictionaries as objects and helper functions in utils.py
      
      * Update convert_partition.py
      
      Updated function call "create_metadata_json" file to include partition_id so that each rank only creates its own metadata object and later on these are accumulated on rank-0 to create graph-level metadata json file.
      
      * addressing code review comments during the CI process
      
      code changes resulting from the code review comments received during the CI process.
      
      * Code reorganization
      
      Addressing CI comments and code reorganization for easier understanding.
      
      * Removed commented out line
      
      removed commented out line.
      9948ef4d