- 11 Sep, 2023 1 commit
-
-
9rum authored
-
- 29 Jun, 2023 1 commit
-
-
Rhett Ying authored
-
- 27 Apr, 2023 1 commit
-
-
Theodore Vasiloudis authored
Co-authored-by:Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
-
- 13 Apr, 2023 1 commit
-
-
kylasa authored
Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 24 Mar, 2023 1 commit
-
-
Hongzhi (Steve), Chen authored
Co-authored-by:Ubuntu <ubuntu@ip-172-31-28-63.ap-northeast-1.compute.internal>
-
- 10 Mar, 2023 2 commits
-
-
kylasa authored
* Replacing numpy's unique with custom implementation * Added docstring to the new function. * Adding unit tests * Numpy's version issues with the 'kind' argument. * Addressing CI Test Failure. * Addressing CI review comments. * revised implementation, optimized for time. * added missing arguments for fallback case. * Addressing CI test failures. * Resolving issues with PYTHONPATH * Fix CI Test Failure issues. * fix CI test failures. --------- Co-authored-by:Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
-
kylasa authored
* Added testcase for testing distributed lookup service. * Applying lintrunner patch. * Fixing CI Test environment failures. * lintrunner patch. * lintrunner patch * Fix CI Failure. * Fixing CI Test failure cases. * lintrunner patch. * lintrunner patch and CI test failure. * Restore no. of test cases. * Resolving pythonpath issues. * lintrunner patch. * updating PYTHONPATH to resolve lib path * Resolve merge conflicts * Resolving issues with PYTHONPATH env variable. * fix module path * rename utils script under test to avoid ambiguity * remove unnecessary pythonpath * fix lint error * fix lint error --------- Co-authored-by:RhettYing <rhett_ying@qq.com>
-
- 06 Mar, 2023 2 commits
-
-
kylasa authored
* Sync parmetis_wrapper with changes in metadata.json 1. In the preprocess.py, make sure that num_partitions is defined as input argument. Also, align 'input_dir' with the input dataset. schema_file is assumed to be located inside the input_dir. Also, graph_stats.txt file is assumed to be present in the input_dir. 2. Use DGL_HOME environment variable so that parmetis_wrapper command can be run anywhere. * Fix CI test failure cases. * Addressing CI review comments. * Addressing CI test failures. * Applying lintrunner patch
-
kylasa authored
* Support for no. of chunks smaller than no. of partitions and Adding appropriate test cases. Following changes are made with this PR. 1. Code changes for handling no. of chunks smaller than no. of partitions 2. Adding new test cases, which were previously deleted, for no. of chunks smaller than no. of partitions. 3. Also adding test cases, where multiple partitions are handled by a single process. * Committing the missing files in this commit. * lintrunner patch. * lintrunner check * lintrunner patch here. * CI review comments.
-
- 25 Feb, 2023 1 commit
-
-
kylasa authored
* Implemented the following changes. * Remove NUM_NODES_PER_CHUNK * Remove NUM_EDGES_PER_CHUNK * Remove the dependency between no. of edge files per edge type and no. of partitions * Remove the dependency between no. of edge feature files per edge type and no. of partitions * Remove the dependency between no. of edge feature files and no. of edge files per edge type. * Remove the dependency between no. of node feature files and no. of partitions * Add “node_type_counts”. This will be a list of integers. Each integer will represent total count of a node-type. The index in this list and the index in the “node_type” will be the same for a given node-type. * Add “edge_type_counts”. This will be a list of integers. Each integer will represent total count of an edge-type. The index in this list and the index in the “edge_type” list will be the same for a given edge-type. * Applying lintrunner patch. * Adding missing keys to the metadata in the unit test framework. * lintrunner patch. * Resolving CI test failures due to merge conflicts. * Applying lintrunner patch * applying lintrunner patch * Replacing tabspace with spaces - to satisfy lintrunner * Fixing the CI Test Failure cases. * Applying lintrunner patch * lintrunner complaining about a blank line. * Resolving issues with print statement for NoneType * Removed tests for the arbitrary chunks tests. Since this functionality is not supported anymore. * Addressing CI review comments. * addressing CI review comments * lintrunner patch * lintrunner patch. * Addressing CI review comments. * lintrunner patch.
-
- 23 Feb, 2023 1 commit
-
-
kylasa authored
* A new script to validate graph partitioning pipeline * Addressing CI review comments. * lintrunner patch.
-
- 19 Feb, 2023 1 commit
-
-
Hongzhi (Steve), Chen authored
* auto-format-test * more * remove --------- Co-authored-by:Ubuntu <ubuntu@ip-172-31-28-63.ap-northeast-1.compute.internal>
-
- 13 Feb, 2023 1 commit
-
-
kylasa authored
Following changes are made in this PR. 1. In dataset_utils.py, when reading edges from disk we follow the order defined by the STR_EDGE_TYPE key in the metadata.json file. This order is implicitly used to assign edgeid to edge types. This same order is used to read edges from the disk as well. 2. Now the unit test framework will also randomize the order of edges read from the disk. This is done for the edges when reading from the disk for the unit tests. Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
- 05 Jan, 2023 1 commit
-
-
Theodore Vasiloudis authored
* Allow reading and writing single-column vector Parquet files. These files are commonly produced by Spark ML's feature processing code. * [Dist] Only write single-column vector files for Parquet in tests.
-
- 03 Jan, 2023 1 commit
-
-
Theodore Vasiloudis authored
[Dist] Add support for Parquet-formatted edges files, remove some assumptions on edge file number. (#5051) * [Dist] Add support for Parquet-formatted edges files, remove some assumptions on edge file number. * [Dist] Add parquet edges option to unit tests. Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 15 Dec, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] enable to chunk node/edge data into arbitrary number of chunks * [Dist] enable to split node/edge data into arbitrary parts * refine code * Format boolean to uint8 forcely to avoid dist.scatter failure * convert boolean to int8 before scatter and revert it after scatter * refine code * fix test * refine code * move test utilities into utils.py * update comment * fix empty data * update * update * fix empty data issue * release unnecessary mem * release unnecessary mem * release unnecessary mem * release unnecessary mem * release unnecessary mem * remove unnecessary shuffle data * separate array_split into standalone utility * add example Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 14 Dec, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] generate partition meta for ParMETIS
-
- 28 Nov, 2022 1 commit
-
-
peizhou001 authored
-
- 18 Nov, 2022 1 commit
-
-
kylasa authored
* Flexible pipeline - Initial commit 1. Implementation of flexible pipeline feature. 2. With this implementation, the pipeline now supports multiple partitions per process. And also assumes that num_partitions is always a multiple of num_processes. * Update test_dist_part.py * Code changes to address review comments * Code refactoring of exchange_features function into two functions for better readability * Upadting test_dist_part to fix merge issues with the master branch * corrected variable names... * Fixed code refactoring issues. * Provide missing function arguments to exchange_feature function * Providing the missing function argument to fix error. * Provide missing function argument to 'get_shuffle_nids' function. * Repositioned a variable within its scope. * Removed tab space which is causing the indentation problem * Fix issue with the CI test framework, which is the root cause for the failure of the CI tests. 1. Now we read files specific to the partition-id and store this data separately, identified by the local_part_id, in the local process. 2. Similarly as above, we also differentiate the node and edge features type_ids with the same keys as above. 3. These above two changes will help up to get the appropriate feature data during the feature exchange and send to the destination process correctly. * Correct the parametrization for the CI unit test cases. * Addressing Rui's code review comments. * Addressing code review comments.
-
- 04 Nov, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] deprecate etype and always use canonical etype for partition and load * enable canonical etypes in dist part pipeline * resolve rebase conflicts * fix lint * fix test failure * throw exception if outdated part config is loaded * refine * refine * revert unnecessary change * fix typo
-
- 27 Oct, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] fix etype issue in dist part pipeline * add comments
-
- 26 Oct, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] reduce startup overhead: enable to save in specified formats * [Dist] reduce startup overhead: sort partitions when generating * sort csc/csr only whenmultiple etypes * refine
-
- 19 Oct, 2022 2 commits
-
-
peizhou001 authored
* add a standalone tool for change etypes to canonical etypes in part config
-
Rhett Ying authored
* [Dist] decouple num_chunks and num_parts for graphs with edge feature * fix test failure
-
- 17 Oct, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] Reduce peak memory in DistDGL: avoid validation, release memory once loaded * remove orig_id from ndata/edata for partition_graph() * delete orig_id from ndata/edata in dist part pipeline * reduce dtype size and format before saving graphs * fix lint * ETYPE requires to be int32/64 for CSRSortByTag * fix test failure * refine
-
- 11 Oct, 2022 1 commit
-
-
Hongzhi (Steve), Chen authored
Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 03 Oct, 2022 2 commits
-
-
kylasa authored
* Creating ParMETIS wrapper script to run parmetis using one script from user perspective * Addressed all the CI comments from PR https://github.com/dmlc/dgl/pull/4529 * Addressing CI comments. * Isort, and black changes. * Replaced python with python3 * Replaced single quote with double quotes per suggestion. * Removed print statement * Addressing CI Commets. * Addressing CI review comments. * Addressing CI comments as per chime discussion with Rui * CI Comments, Black and isort changes * Align with code refactoring, black, isort and code review comments. * Addressing CI review comments, and fixing merge issues with the master branch * Updated with proper unit test skip decorator
-
kylasa authored
* Added support for edge features. * Added comments and removing unnecessary print statements. * updated data_shuffle.py to remove compile error. * Repaled python3 with python to match CI test framework. * Removed unrelated files from the pull request. * Isort changes. * black changes on this file. * Addressing CI review comments. * Addressing CI comments. * Removed duplicated and resolved merge conflict code. * Addressing CI Comments from Rui. * Addressing CI comments, and fixing merge issues. * Addressing CI comments, code refactoring, isort and black
-
- 28 Sep, 2022 2 commits
-
-
Rhett Ying authored
* [Dist] enable to partition many chunks into less partitions via pipeline * refine * add meta file for num_parts, add more tests, refine docstring * remove args.num_parts * create pydantic class for partition metadata * refine * rename json file
-
Rhett Ying authored
* [Dist] save original node/edge IDs into separate files * separate nids and eids
-
- 20 Sep, 2022 1 commit
-
-
peizhou001 authored
-
- 15 Sep, 2022 1 commit
-
-
Rhett Ying authored
* [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-
- 22 Aug, 2022 1 commit
-
-
Mufei Li authored
* Update distributed-preprocessing.rst * Update Co-authored-by:Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
-
- 19 Aug, 2022 1 commit
-
-
Mufei Li authored
* chunked graph data format * Update * Update * Update task_distributed_test.sh * Update * Update * Revert "Update" This reverts commit 03c461870f19375fb03125b061fc853ab555577f. * Update * Update * ssh-keygen * CI * install openssh * openssh * Update * CI * Update * Update Co-authored-by:
Ubuntu <ubuntu@ip-172-31-53-142.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-16-87.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-20-21.us-west-2.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
-
- 14 Jun, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] master port should be fixed for all trainers * add tests for tools/launch.py
-
- 17 Aug, 2021 1 commit
-
-
Eric Kim authored
[Tools] In `tools/launch.py`, correctly pass all DGL client/server env vars if udf is a multi-command (#3245) * Correctly pass all DGL client/server env vars if udf is a multi-command * Refactor to use wrap_cmd_with_local_envvars() helper fn
-
- 02 Aug, 2021 1 commit
-
-
Eric Kim authored
* Refactors torch dist launcher udf-wrap code to handle more python versions * minor changes
-