- 13 Feb, 2023 1 commit
-
-
kylasa authored
Following changes are made in this PR. 1. In dataset_utils.py, when reading edges from disk we follow the order defined by the STR_EDGE_TYPE key in the metadata.json file. This order is implicitly used to assign edgeid to edge types. This same order is used to read edges from the disk as well. 2. Now the unit test framework will also randomize the order of edges read from the disk. This is done for the edges when reading from the disk for the unit tests. Co-authored-by:Quan (Andy) Gan <coin2028@hotmail.com>
-
- 05 Jan, 2023 1 commit
-
-
Theodore Vasiloudis authored
* Allow reading and writing single-column vector Parquet files. These files are commonly produced by Spark ML's feature processing code. * [Dist] Only write single-column vector files for Parquet in tests.
-
- 03 Jan, 2023 1 commit
-
-
Theodore Vasiloudis authored
[Dist] Add support for Parquet-formatted edges files, remove some assumptions on edge file number. (#5051) * [Dist] Add support for Parquet-formatted edges files, remove some assumptions on edge file number. * [Dist] Add parquet edges option to unit tests. Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-
- 15 Dec, 2022 1 commit
-
-
Rhett Ying authored
* [Dist] enable to chunk node/edge data into arbitrary number of chunks * [Dist] enable to split node/edge data into arbitrary parts * refine code * Format boolean to uint8 forcely to avoid dist.scatter failure * convert boolean to int8 before scatter and revert it after scatter * refine code * fix test * refine code * move test utilities into utils.py * update comment * fix empty data * update * update * fix empty data issue * release unnecessary mem * release unnecessary mem * release unnecessary mem * release unnecessary mem * release unnecessary mem * remove unnecessary shuffle data * separate array_split into standalone utility * add example Co-authored-by:xiang song(charlie.song) <classicxsong@gmail.com>
-