1. 16 Feb, 2023 1 commit
    • kylasa's avatar
      [DistDGL][Mem_Optimizations]Edge Ownership processes are computed on the fly when required. (#5225) · e25f47de
      kylasa authored
      * Edge Ownership processes are computed on the fly when required.
      
      Earlier we were storing Edge ownership processes after the dataset was retrieved from the disk. For massively large datasets, each node can handle upto 5 Billion edges, this means storing owner process-ids will consume 5 * 8 = 40GB. This memory will be hanging around until the edges are exchanged.
      
      To reduce the memory footprint of the pipeline, we no longer store the ownership process-ids in the 'edge_data' dictionary after reading the dataset from the disk. Instead, we compute them on the fly at the time of exchanging edges.
      
      Another optimization is not to send/receive all the messages in a one single large message. Instead we now split the total number edges into chunks, limited by 8 GB per node. And we iterate until all the chunks are exchanged.
      
      Once all the edges are exchanged, as a sanity check, we compute the total number of edges in the system and compare it with the original value before edge shuffling, in a final assert statement before return the result to the caller.
      
      * Applying lintrunner patch.
      e25f47de
  2. 13 Feb, 2023 1 commit
    • kylasa's avatar
      Code changes to fix order sensitivity of the pipeline (#5288) · 432c71ef
      kylasa authored
      
      
      Following changes are made in this PR.
      1. In dataset_utils.py, when reading edges from disk we follow the order defined by the STR_EDGE_TYPE key in the metadata.json file. This order is implicitly used to assign edgeid to edge types. This same order is used to read edges from the disk as well.
      2. Now the unit test framework will also randomize the order of edges read from the disk. This is done for the edges when reading from the disk for the unit tests.
      Co-authored-by: default avatarQuan (Andy) Gan <coin2028@hotmail.com>
      432c71ef
  3. 10 Feb, 2023 2 commits
  4. 03 Feb, 2023 1 commit
  5. 02 Feb, 2023 1 commit
  6. 05 Jan, 2023 1 commit
  7. 03 Jan, 2023 1 commit
  8. 15 Dec, 2022 1 commit
    • Rhett Ying's avatar
      [Dist] enable to chunk node/edge data into arbitrary number of chunks (#4930) · 9731e023
      Rhett Ying authored
      
      
      * [Dist] enable to chunk node/edge data into arbitrary number of chunks
      
      * [Dist] enable to split node/edge data into arbitrary parts
      
      * refine code
      
      * Format boolean to uint8 forcely to avoid dist.scatter failure
      
      * convert boolean to int8 before scatter and revert it after scatter
      
      * refine code
      
      * fix test
      
      * refine code
      
      * move test utilities into utils.py
      
      * update comment
      
      * fix empty data
      
      * update
      
      * update
      
      * fix empty data issue
      
      * release unnecessary mem
      
      * release unnecessary mem
      
      * release unnecessary mem
      
      * release unnecessary mem
      
      * release unnecessary mem
      
      * remove unnecessary shuffle data
      
      * separate array_split into standalone utility
      
      * add example
      Co-authored-by: default avatarxiang song(charlie.song) <classicxsong@gmail.com>
      9731e023
  9. 14 Dec, 2022 1 commit
  10. 07 Dec, 2022 1 commit
  11. 30 Nov, 2022 1 commit
  12. 28 Nov, 2022 2 commits
  13. 18 Nov, 2022 1 commit
    • kylasa's avatar
      [Dist] Flexible pipeline - Initial commit (#4733) · c8ea9fa4
      kylasa authored
      * Flexible pipeline - Initial commit
      
      1. Implementation of flexible pipeline feature.
      2. With this implementation, the pipeline now supports multiple partitions per process. And also assumes that num_partitions is always a multiple of num_processes.
      
      * Update test_dist_part.py
      
      * Code changes to address review comments
      
      * Code refactoring of exchange_features function into two functions for better readability
      
      * Upadting test_dist_part to fix merge issues with the master branch
      
      * corrected variable names...
      
      * Fixed code refactoring issues.
      
      * Provide missing function arguments to exchange_feature function
      
      * Providing the missing function argument to fix error.
      
      * Provide missing function argument to 'get_shuffle_nids' function.
      
      * Repositioned a variable within its scope.
      
      * Removed tab space which is causing the indentation problem
      
      * Fix issue with the CI test framework, which is the root cause for the failure of the CI tests.
      
      1. Now we read files specific to the partition-id and store this data separately, identified by the local_part_id, in the local process.
      2. Similarly as above, we also differentiate the node and edge features type_ids with the same keys as above.
      3. These above two changes will help up to get the appropriate feature data during the feature exchange and send to the destination process correctly.
      
      * Correct the parametrization for the CI unit test cases.
      
      * Addressing Rui's code review comments.
      
      * Addressing code review comments.
      c8ea9fa4
  14. 17 Nov, 2022 1 commit
  15. 09 Nov, 2022 1 commit
  16. 08 Nov, 2022 1 commit
    • kylasa's avatar
      [DIST] Message size to retrieve SHUFFLE_GLOBAL_NIDs is resulting in very large... · 4cd0a685
      kylasa authored
      [DIST] Message size to retrieve SHUFFLE_GLOBAL_NIDs is resulting in very large messages and resulting in killed process (#4790)
      
      * Send out the message to the distributed lookup service in batches.
      
      * Update function signature for allgather_sizes function call.
      
      * Removed the unnecessary if statement .
      
      * Removed logging.info message, which is not needed.
      4cd0a685
  17. 07 Nov, 2022 3 commits
  18. 04 Nov, 2022 2 commits
  19. 31 Oct, 2022 1 commit
  20. 27 Oct, 2022 1 commit
  21. 26 Oct, 2022 1 commit
  22. 19 Oct, 2022 2 commits
  23. 17 Oct, 2022 1 commit
    • Rhett Ying's avatar
      [Dist] Reduce peak memory in DistDGL (#4687) · b1309217
      Rhett Ying authored
      * [Dist] Reduce peak memory in DistDGL: avoid validation, release memory once loaded
      
      * remove orig_id from ndata/edata for partition_graph()
      
      * delete orig_id from ndata/edata in dist part pipeline
      
      * reduce dtype size and format before saving graphs
      
      * fix lint
      
      * ETYPE requires to be int32/64 for CSRSortByTag
      
      * fix test failure
      
      * refine
      b1309217
  24. 12 Oct, 2022 1 commit
  25. 11 Oct, 2022 1 commit
  26. 03 Oct, 2022 2 commits
    • kylasa's avatar
      ParMETIS wrapper script to enable ParMETIS to process chunked dataset format (#4605) · eae6ce2a
      kylasa authored
      * Creating ParMETIS wrapper script to run parmetis using one script from user perspective
      
      * Addressed all the CI comments from PR https://github.com/dmlc/dgl/pull/4529
      
      * Addressing CI comments.
      
      * Isort, and black changes.
      
      * Replaced python with python3
      
      * Replaced single quote with double quotes per suggestion.
      
      * Removed print statement
      
      * Addressing CI Commets.
      
      * Addressing CI review comments.
      
      * Addressing CI comments as per chime discussion with Rui
      
      * CI Comments, Black and isort changes
      
      * Align with code refactoring, black, isort and code review comments.
      
      * Addressing CI review comments, and fixing merge issues with the master branch
      
      * Updated with proper unit test skip decorator
      eae6ce2a
    • kylasa's avatar
      Edge Feature support for input graph datasets for dist. graph partitioning pipeline (#4623) · 1f471396
      kylasa authored
      * Added support for edge features.
      
      * Added comments and removing unnecessary print statements.
      
      * updated data_shuffle.py to remove compile error.
      
      * Repaled python3 with python to match CI test framework.
      
      * Removed unrelated files from the pull request.
      
      * Isort changes.
      
      * black changes on this file.
      
      * Addressing CI review comments.
      
      * Addressing CI comments.
      
      * Removed duplicated and resolved merge conflict code.
      
      * Addressing CI Comments from Rui.
      
      * Addressing CI comments, and fixing merge issues.
      
      * Addressing CI comments, code refactoring, isort and black
      1f471396
  27. 28 Sep, 2022 2 commits
  28. 23 Sep, 2022 1 commit
    • kylasa's avatar
      Garbage Collection and memory snapshot code for debugging partitioning... · ace76327
      kylasa authored
      
       Garbage Collection and memory snapshot code for debugging partitioning pipeline (target as master branch) (#4598)
      
      * Squashed commit of the following:
      
      commit e605a550b3783dd5f24eb39b6873a2e0e79be9c7
      Author: kylasa <kylasa@gmail.com>
      Date:   Thu Sep 15 14:45:39 2022 -0700
      
          Delete pyproject.toml
      
      commit f2db9e700d817212b67b5227f6472d218f0c74f2
      Author: kylasa <kylasa@gmail.com>
      Date:   Thu Sep 15 14:44:40 2022 -0700
      
          Changes suggested by isort program to sort imports.
      
      commit 5a6078beac6218a4f1fb378c169f04dda7396425
      Author: kylasa <kylasa@gmail.com>
      Date:   Thu Sep 15 14:39:50 2022 -0700
      
          addressing code review comments from the CI process.
      
      commit c8e92decb7aebeb32c7467108e16f058491443ab
      Author: kylasa <kylasa@gmail.com>
      Date:   Wed Sep 14 18:23:59 2022 -0700
      
          Corrected a typo in the import statement
      
      commit 14ddb0e9b553d5be3ed2c50d82dee671e84ad8c9
      Author: kylasa <kylasa@gmail.com>
      Date:   Tue Sep 13 18:47:34 2022 -0700
      
          Memory snapshot code for debugging memory footprint of the graph partitioning pipeline
      
      Squashed commit done
      
      * Addressing code review comments.
      
      * Update utils.py
      
      * dummy change to trigger CI tests
      Co-authored-by: default avatarRhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
      ace76327
  29. 20 Sep, 2022 1 commit
  30. 15 Sep, 2022 1 commit
  31. 22 Aug, 2022 1 commit
  32. 21 Aug, 2022 1 commit