- 06 Mar, 2023 1 commit
-
-
kylasa authored
* Support for no. of chunks smaller than no. of partitions and Adding appropriate test cases. Following changes are made with this PR. 1. Code changes for handling no. of chunks smaller than no. of partitions 2. Adding new test cases, which were previously deleted, for no. of chunks smaller than no. of partitions. 3. Also adding test cases, where multiple partitions are handled by a single process. * Committing the missing files in this commit. * lintrunner patch. * lintrunner check * lintrunner patch here. * CI review comments.
-
- 28 Feb, 2023 1 commit
-
-
kylasa authored
Handling corner cases in the distributed lookup service. When the get partition ids function is invoked with empty request. This is needed because we are using alltoall function in the get_partition_ids function.
-
- 25 Feb, 2023 1 commit
-
-
kylasa authored
* Implemented the following changes. * Remove NUM_NODES_PER_CHUNK * Remove NUM_EDGES_PER_CHUNK * Remove the dependency between no. of edge files per edge type and no. of partitions * Remove the dependency between no. of edge feature files per edge type and no. of partitions * Remove the dependency between no. of edge feature files and no. of edge files per edge type. * Remove the dependency between no. of node feature files and no. of partitions * Add “node_type_counts”. This will be a list of integers. Each integer will represent total count of a node-type. The index in this list and the index in the “node_type” will be the same for a given node-type. * Add “edge_type_counts”. This will be a list of integers. Each integer will represent total count of an edge-type. The index in this list and the index in the “edge_type” list will be the same for a given edge-type. * Applying lintrunner patch. * Adding missing keys to the metadata in the unit test framework. * lintrunner patch. * Resolving CI test failures due to merge conflicts. * Applying lintrunner patch * applying lintrunner patch * Replacing tabspace with spaces - to satisfy lintrunner * Fixing the CI Test Failure cases. * Applying lintrunner patch * lintrunner complaining about a blank line. * Resolving issues with print statement for NoneType * Removed tests for the arbitrary chunks tests. Since this functionality is not supported anymore. * Addressing CI review comments. * addressing CI review comments * lintrunner patch * lintrunner patch. * Addressing CI review comments. * lintrunner patch.
-
- 23 Feb, 2023 1 commit
-
-
kylasa authored
[DistDGL][Mem_Optimizations]get_partition_ids, service provided by the distributed lookup service has high memory footprint (#5226) * get_partition_ids, service provided by the distributed lookup service has high memory footprint 'get_partitionid' function, which is used to retrieve owner processes of the given list of global node ids, has high memory footprint. Currently this is of the order of 8x compared to the size of the input list. For massively large datasets, this memory needs are very unrealistic and may result in OOM. In the case of CoreGraph, when retrieving owner of an edge list of size 6 Billion edges, the memory needs can be as high as 8*8*8 = 256 GB. To limit the amount of memory used by this function, we split the size of the message sent to the distributed lookup service, so that each message is limited by the number of global node ids, which is 200 million. This reduced the memory footprint of this entire function to be no more than 0.2 * 8 * 8 = 13 GB. which is within reasonable limits. Now since we send multiple small messages compared to one large message to the distributed lookup service, this may consume more wall-clock-time compared to earlier implementation. * lintrunner patch. * using np.ceil() per suggestion. * converting the output of np.ceil() as ints.
-
- 19 Feb, 2023 1 commit
-
-
Hongzhi (Steve), Chen authored
Co-authored-by:Ubuntu <ubuntu@ip-172-31-28-63.ap-northeast-1.compute.internal>
-
- 03 Feb, 2023 1 commit
-
-
kylasa authored
* lintrunner patch for gloo_wrapper.py * lintrunner changes to the tools directory.
-
- 18 Nov, 2022 1 commit
-
-
kylasa authored
* Flexible pipeline - Initial commit 1. Implementation of flexible pipeline feature. 2. With this implementation, the pipeline now supports multiple partitions per process. And also assumes that num_partitions is always a multiple of num_processes. * Update test_dist_part.py * Code changes to address review comments * Code refactoring of exchange_features function into two functions for better readability * Upadting test_dist_part to fix merge issues with the master branch * corrected variable names... * Fixed code refactoring issues. * Provide missing function arguments to exchange_feature function * Providing the missing function argument to fix error. * Provide missing function argument to 'get_shuffle_nids' function. * Repositioned a variable within its scope. * Removed tab space which is causing the indentation problem * Fix issue with the CI test framework, which is the root cause for the failure of the CI tests. 1. Now we read files specific to the partition-id and store this data separately, identified by the local_part_id, in the local process. 2. Similarly as above, we also differentiate the node and edge features type_ids with the same keys as above. 3. These above two changes will help up to get the appropriate feature data during the feature exchange and send to the destination process correctly. * Correct the parametrization for the CI unit test cases. * Addressing Rui's code review comments. * Addressing code review comments.
-
- 17 Nov, 2022 1 commit
-
-
Serge Panev authored
Signed-off-by:
Serge Panev <spanev@nvidia.com> Signed-off-by:
Serge Panev <spanev@nvidia.com>
-
- 07 Nov, 2022 2 commits
- 03 Oct, 2022 1 commit
-
-
kylasa authored
* Added support for edge features. * Added comments and removing unnecessary print statements. * updated data_shuffle.py to remove compile error. * Repaled python3 with python to match CI test framework. * Removed unrelated files from the pull request. * Isort changes. * black changes on this file. * Addressing CI review comments. * Addressing CI comments. * Removed duplicated and resolved merge conflict code. * Addressing CI Comments from Rui. * Addressing CI comments, and fixing merge issues. * Addressing CI comments, code refactoring, isort and black
-
- 23 Sep, 2022 1 commit
-
-
kylasa authored
Garbage Collection and memory snapshot code for debugging partitioning pipeline (target as master branch) (#4598) * Squashed commit of the following: commit e605a550b3783dd5f24eb39b6873a2e0e79be9c7 Author: kylasa <kylasa@gmail.com> Date: Thu Sep 15 14:45:39 2022 -0700 Delete pyproject.toml commit f2db9e700d817212b67b5227f6472d218f0c74f2 Author: kylasa <kylasa@gmail.com> Date: Thu Sep 15 14:44:40 2022 -0700 Changes suggested by isort program to sort imports. commit 5a6078beac6218a4f1fb378c169f04dda7396425 Author: kylasa <kylasa@gmail.com> Date: Thu Sep 15 14:39:50 2022 -0700 addressing code review comments from the CI process. commit c8e92decb7aebeb32c7467108e16f058491443ab Author: kylasa <kylasa@gmail.com> Date: Wed Sep 14 18:23:59 2022 -0700 Corrected a typo in the import statement commit 14ddb0e9b553d5be3ed2c50d82dee671e84ad8c9 Author: kylasa <kylasa@gmail.com> Date: Tue Sep 13 18:47:34 2022 -0700 Memory snapshot code for debugging memory footprint of the graph partitioning pipeline Squashed commit done * Addressing code review comments. * Update utils.py * dummy change to trigger CI tests Co-authored-by:Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
-
- 17 Aug, 2022 1 commit
-
-
kylasa authored
* Distributed Lookup service which is for retrieving global_nids to shuffle-global-nids/partition-id mappings 1. Implemented a class to provide distributed lookup service 2. This class can be used to retrieve global-nids mappings * Code changes to address CI comments. 1. Removed some unneeded type_casts to numpy.int64 2. Added additional comments when iterating over the partition-ids list. 3.Added docstring to the class and adjusted comments where it is relevant. * Updated code comments and variable names... 1. Changed the variable names to appropriately represent the values stored in these variables. 2. Updated the docstring correctly. * Corrected docstring as per the suggestion... and removed all the capital letters for Global nids and Shuffle Global nids... * Addressing CI review comments.
-