Commits · 432c71ef25c7780051cc02e1e573a494df9c0b3c · OpenDAS / dgl

13 Feb, 2023 1 commit

Code changes to fix order sensitivity of the pipeline (#5288) · 432c71ef

kylasa authored Feb 13, 2023

Following changes are made in this PR.
1. In dataset_utils.py, when reading edges from disk we follow the order defined by the STR_EDGE_TYPE key in the metadata.json file. This order is implicitly used to assign edgeid to edge types. This same order is used to read edges from the disk as well.
2. Now the unit test framework will also randomize the order of edges read from the disk. This is done for the edges when reading from the disk for the unit tests.
Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>

432c71ef

05 Jan, 2023 1 commit

[Dist] Allow reading and writing single-column vector Parquet files. (#5098) · 9890201d

Theodore Vasiloudis authored Jan 05, 2023

* Allow reading and writing single-column vector Parquet files.

These files are commonly produced by Spark ML's feature processing code.

* [Dist] Only write single-column vector files for Parquet in tests.

9890201d

03 Jan, 2023 1 commit

[Dist] Add support for Parquet-formatted edges files, remove some assumptions... · 774709d3

Theodore Vasiloudis authored Jan 03, 2023


[Dist] Add support for Parquet-formatted edges files, remove some assumptions on edge file number. (#5051)

* [Dist] Add support for Parquet-formatted edges files, remove some assumptions on edge file number.

* [Dist] Add parquet edges option to unit tests.
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

774709d3

15 Dec, 2022 1 commit

[Dist] enable to chunk node/edge data into arbitrary number of chunks (#4930) · 9731e023

Rhett Ying authored Dec 15, 2022



* [Dist] enable to chunk node/edge data into arbitrary number of chunks

* [Dist] enable to split node/edge data into arbitrary parts

* refine code

* Format boolean to uint8 forcely to avoid dist.scatter failure

* convert boolean to int8 before scatter and revert it after scatter

* refine code

* fix test

* refine code

* move test utilities into utils.py

* update comment

* fix empty data

* update

* update

* fix empty data issue

* release unnecessary mem

* release unnecessary mem

* release unnecessary mem

* release unnecessary mem

* release unnecessary mem

* remove unnecessary shuffle data

* separate array_split into standalone utility

* add example
Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com>

9731e023