[Deprecation] Examples (#4751)

* Update from master (#4584) * [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430) * Add refactors for multi-gpu and full-graph example * Fix format * Update * Update * Update * [Cleanup] Remove async_transferer (#4505) * Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> * [Cleanup] Remove duplicate entries of CUB submodule (issue# 4395) (#4499) * remove third_part/cub * remove from third_party Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Xin Yao <xiny@nvidia.com> * [Bug] Enable turn on/off libxsmm at runtime (#4455) * enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> * [Feature] Unify the cuda stream used in core library (#4480) * Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by: xiny <xiny@nvidia.com> * [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389) * * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph * Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information * * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing * * Added test to ensure dgl.remove_self_loop function correctly updates batch information * * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph * * Formatting * * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph * * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass * Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed * * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments * * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test * * Added doc comments for new parameters * * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph * Added test cases for more than k coincident points * * Updated doc comments for output_batch parameters for clarity * * Linter formatting fixes * * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu * * Rewording in doc comments * * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [CI] only known devs are authorized to trigger CI (#4518) * [CI] only known devs are authorized to trigger CI * fix if author is null * add comments * [Readability] Auto fix setup.py and update-version.py (#4446) * Auto fix update-version * Auto fix setup.py * Auto fix update-version * Auto fix setup.py * [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438) * Update distributed-preprocessing.rst * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * fix unpinning when tensoradaptor is not available (#4450) * [Doc] fix print issue in tutorial (#4459) * [Example][Refactor] Refactor RGCN example (#4327) * Refactor full graph entity classification * Refactor rgcn with sampling * README update * Update * Results update * Respect default setting of self_loop=false in entity.py * Update * Update README * Update for multi-gpu * Update * [doc] fix invalid link in user guide (#4468) * [Example] directional_GSN for ogbg-molpcba (#4405) * version-1 * version-2 * version-3 * update examples/README * Update .gitignore * update performance in README, delete scripts * 1st approving review * 2nd approving review Co-authored-by: Mufei Li <mufeili1996@gmail.com> * Clarify the message name, which is 'm'. (#4462) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * [Refactor] Auto fix view.py. (#4461) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Example] SEAL for OGBL (#4291) * [Example] SEAL for OGBL * update index * update * fix readme typo * add seal sampler * modify set ops * prefetch * efficiency test * update * optimize * fix ScatterAdd dtype issue * update sampler style * update Co-authored-by: Quan Gan <coin2028@hotmail.com> * [CI] use https instead of http (#4488) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() * fix test failure in TF * [Feature] Make TensorAdapter Stream Aware (#4472) * Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments * [Build][Doc] Specify the sphinx version (#4465) Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * reformat * reformat * Auto fix update-version * Auto fix setup.py * reformat * reformat Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> * Move mock version of dgl_sparse library to DGL main repo (#4524) * init * Add api doc for sparse library * support op btwn matrices with differnt sparsity * Fixed docstring * addresses comments * lint check * change keyword format to fmt Co-authored-by: Israt Nisa <nisisrat@amazon.com> * [DistPart] expose timeout config for process group (#4532) * [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Feature] Import PyTorch's CUDA stream management (#4503) * add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle * [examples]educe memory consumption (#4558) * [examples]educe memory consumption * reffine help message * refine * [Feature][REVIEW] Enable DGL cugaph nightly CI (#4525) * Added cugraph nightly scripts * Removed nvcr.io//nvidia/pytorch:22.04-py3 reference Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI (#4525)" (#4563) This reverts commit ec171c64 . * [Misc] Add flake8 lint workflow. (#4566) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. * Add flake8 annotation in workflow. * remove * add * clean up Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Try use official pylint workflow. (#4568) * polish update_version * update pylint workflow. * add * revert. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [CI] refine stage logic (#4565) * [CI] refine stage logic * refine * refine * remove (#4570) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Add Pylint workflow for flake8. (#4571) * remove * Add pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Update the python version in Pylint workflow for flake8. (#4572) * remove * Add pylint. * Change the python version for pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4574) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Use another workflow. (#4575) * Update pylint. * Use another workflow. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4576) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint.yml * Update pylint.yml * Delete pylint.yml * [Misc]Add pyproject.toml for autopep8 & black. (#4543) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) * rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com> * [Deprecation] Dataset Attributes (#4546) * Update * CI * CI * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * [Example] Bug Fix (#4665) * Update * CI * CI * Update * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * Update * Update (#4724) Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>

[Deprecation] Examples (#4751)
* Update from master (#4584) * [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430) * Add refactors for multi-gpu and full-graph example * Fix format * Update * Update * Update * [Cleanup] Remove async_transferer (#4505) * Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> * [Cleanup] Remove duplicate entries of CUB submodule (issue# 4395) (#4499) * remove third_part/cub * remove from third_party Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: Xin Yao <xiny@nvidia.com> * [Bug] Enable turn on/off libxsmm at runtime (#4455) * enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> * [Feature] Unify the cuda stream used in core library (#4480) * Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by: xiny <xiny@nvidia.com> * [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389) * * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph * Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information * * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing * * Added test to ensure dgl.remove_self_loop function correctly updates batch information * * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph * * Formatting * * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph * * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass * Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed * * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments * * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test * * Added doc comments for new parameters * * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph * Added test cases for more than k coincident points * * Updated doc comments for output_batch parameters for clarity * * Linter formatting fixes * * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu * * Rewording in doc comments * * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [CI] only known devs are authorized to trigger CI (#4518) * [CI] only known devs are authorized to trigger CI * fix if author is null * add comments * [Readability] Auto fix setup.py and update-version.py (#4446) * Auto fix update-version * Auto fix setup.py * Auto fix update-version * Auto fix setup.py * [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438) * Update distributed-preprocessing.rst * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * fix unpinning when tensoradaptor is not available (#4450) * [Doc] fix print issue in tutorial (#4459) * [Example][Refactor] Refactor RGCN example (#4327) * Refactor full graph entity classification * Refactor rgcn with sampling * README update * Update * Results update * Respect default setting of self_loop=false in entity.py * Update * Update README * Update for multi-gpu * Update * [doc] fix invalid link in user guide (#4468) * [Example] directional_GSN for ogbg-molpcba (#4405) * version-1 * version-2 * version-3 * update examples/README * Update .gitignore * update performance in README, delete scripts * 1st approving review * 2nd approving review Co-authored-by: Mufei Li <mufeili1996@gmail.com> * Clarify the message name, which is 'm'. (#4462) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * [Refactor] Auto fix view.py. (#4461) Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Example] SEAL for OGBL (#4291) * [Example] SEAL for OGBL * update index * update * fix readme typo * add seal sampler * modify set ops * prefetch * efficiency test * update * optimize * fix ScatterAdd dtype issue * update sampler style * update Co-authored-by: Quan Gan <coin2028@hotmail.com> * [CI] use https instead of http (#4488) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() * fix test failure in TF * [Feature] Make TensorAdapter Stream Aware (#4472) * Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments * [Build][Doc] Specify the sphinx version (#4465) Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * reformat * reformat * Auto fix update-version * Auto fix setup.py * reformat * reformat Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Mufei Li <mufeili1996@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> * Move mock version of dgl_sparse library to DGL main repo (#4524) * init * Add api doc for sparse library * support op btwn matrices with differnt sparsity * Fixed docstring * addresses comments * lint check * change keyword format to fmt Co-authored-by: Israt Nisa <nisisrat@amazon.com> * [DistPart] expose timeout config for process group (#4532) * [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> * [Feature] Import PyTorch's CUDA stream management (#4503) * add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle * [examples]educe memory consumption (#4558) * [examples]educe memory consumption * reffine help message * refine * [Feature][REVIEW] Enable DGL cugaph nightly CI (#4525) * Added cugraph nightly scripts * Removed nvcr.io//nvidia/pytorch:22.04-py3 reference Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI (#4525)" (#4563) This reverts commit ec171c64 . * [Misc] Add flake8 lint workflow. (#4566) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. * Add flake8 annotation in workflow. * remove * add * clean up Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Try use official pylint workflow. (#4568) * polish update_version * update pylint workflow. * add * revert. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [CI] refine stage logic (#4565) * [CI] refine stage logic * refine * refine * remove (#4570) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Add Pylint workflow for flake8. (#4571) * remove * Add pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Update the python version in Pylint workflow for flake8. (#4572) * remove * Add pylint. * Change the python version for pylint. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4574) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Use another workflow. (#4575) * Update pylint. * Use another workflow. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4576) Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint.yml * Update pylint.yml * Delete pylint.yml * [Misc]Add pyproject.toml for autopep8 & black. (#4543) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. Co-authored-by: Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) * rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com> * [Deprecation] Dataset Attributes (#4546) * Update * CI * CI * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * [Example] Bug Fix (#4665) * Update * CI * CI * Update * Update Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * Update * Update (#4724) Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Chang Liu <chang.liu@utexas.edu> Co-authored-by: nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by: Xin Yao <xiny@nvidia.com> Co-authored-by: Xin Yao <yaox12@outlook.com> Co-authored-by: Israt Nisa <neesha295@gmail.com> Co-authored-by: Israt Nisa <nisisrat@amazon.com> Co-authored-by: peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by: ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by: Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by: rudongyu <ru_dongyu@outlook.com> Co-authored-by: Quan Gan <coin2028@hotmail.com> Co-authored-by: Vibhu Jawa <vibhujawa@gmail.com>
ec17136e · Mufei Li · GitHub · 1990e797 · 1990e797 · 1990e797
Unverified Commit ec17136e authored Oct 26, 2022 by Mufei Li Committed by GitHub Oct 26, 2022
20 changed files
--- a/examples/mxnet/_deprecated/gcmc/.gitignore
+++ b/examples/mxnet/_deprecated/gcmc/.gitignore
-log
--- a/examples/mxnet/_deprecated/gcmc/README.md
+++ b/examples/mxnet/_deprecated/gcmc/README.md
-# Graph Convolutional Matrix Completion
-
-Paper link: [https://arxiv.org/abs/1706.02263](https://arxiv.org/abs/1706.02263)
-Author's code: [https://github.com/riannevdberg/gc-mc](https://github.com/riannevdberg/gc-mc)
-
-The implementation does not handle side-channel features and mini-epoching and thus achieves
-slightly worse performance when using node features.
-
-Credit: Jiani Zhang ([@jennyzhang0215](https://github.com/jennyzhang0215))
-
-## Dependencies
-* MXNet 1.5.0+
-* pandas
-* gluonnlp
-
-## Data
-
-Supported datasets: ml-100k, ml-1m, ml-10m
-
-## How to run
-
-ml-100k, no feature
-```bash
-DGLBACKEND=mxnet python train.py --data_name=ml-100k --use_one_hot_fea --gcn_agg_accum=stack
-```
-Results: RMSE=0.9077 (0.910 reported)
-Speed: 0.0246s/epoch (vanilla implementation: 0.1008s/epoch)
-
-ml-100k, with feature
-```bash
-DGLBACKEND=mxnet python train.py --data_name=ml-100k --gcn_agg_accum=stack
-```
-Results: RMSE=0.9495 (0.905 reported)
-
-ml-1m, no feature
-```bash
-DGLBACKEND=mxnet python train.py --data_name=ml-1m --gcn_agg_accum=sum --use_one_hot_fea
-```
-Results: RMSE=0.8377 (0.832 reported)
-Speed: 0.0695s/epoch (vanilla implementation: 1.538s/epoch)
-
-ml-10m, no feature
-```bash
-DGLBACKEND=mxnet python train.py --data_name=ml-10m --gcn_agg_accum=stack --gcn_dropout=0.3 \
-                                 --train_lr=0.001 --train_min_lr=0.0001 --train_max_iter=15000 \
-                                 --use_one_hot_fea --gen_r_num_basis_func=4
-```
-Results: RMSE=0.7875 (0.777 reported)
-Speed: 0.6480s/epoch (vanilla implementation: OOM)
-
-Testbed: EC2 p3.2xlarge
--- a/examples/mxnet/_deprecated/gcmc/data.py
+++ b/examples/mxnet/_deprecated/gcmc/data.py
-"""MovieLens dataset"""
-import numpy as np
-import os
-import re
-import pandas as pd
-import scipy.sparse as sp
-import gluonnlp as nlp
-import mxnet as mx
-
-import dgl
-from dgl.data.utils import download, extract_archive, get_download_dir
-
-_urls = {
-    'ml-100k' : 'http://files.grouplens.org/datasets/movielens/ml-100k.zip',
-    'ml-1m' : 'http://files.grouplens.org/datasets/movielens/ml-1m.zip',
-    'ml-10m' : 'http://files.grouplens.org/datasets/movielens/ml-10m.zip',
-}
-
-READ_DATASET_PATH = get_download_dir()
-GENRES_ML_100K =\
-    ['unknown', 'Action', 'Adventure', 'Animation',
-     'Children', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
-     'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi',
-     'Thriller', 'War', 'Western']
-GENRES_ML_1M = GENRES_ML_100K[1:]
-GENRES_ML_10M = GENRES_ML_100K + ['IMAX']
-
-class MovieLens(object):
-    """MovieLens dataset used by GCMC model
-
-    TODO(minjie): make this dataset more general
-
-    The dataset stores MovieLens ratings in two types of graphs. The encoder graph
-    contains rating value information in the form of edge types. The decoder graph
-    stores plain user-movie pairs in the form of a bipartite graph with no rating
-    information. All graphs have two types of nodes: "user" and "movie".
-
-    The training, validation and test set can be summarized as follows:
-
-    training_enc_graph : training user-movie pairs + rating info
-    training_dec_graph : training user-movie pairs
-    valid_enc_graph : training user-movie pairs + rating info
-    valid_dec_graph : validation user-movie pairs
-    test_enc_graph : training user-movie pairs + validation user-movie pairs + rating info
-    test_dec_graph : test user-movie pairs
-
-    Attributes
-    ----------
-    train_enc_graph : dgl.DGLHeteroGraph
-        Encoder graph for training.
-    train_dec_graph : dgl.DGLHeteroGraph
-        Decoder graph for training.
-    train_labels : mx.nd.NDArray
-        The categorical label of each user-movie pair
-    train_truths : mx.nd.NDArray
-        The actual rating values of each user-movie pair
-    valid_enc_graph : dgl.DGLHeteroGraph
-        Encoder graph for validation.
-    valid_dec_graph : dgl.DGLHeteroGraph
-        Decoder graph for validation.
-    valid_labels : mx.nd.NDArray
-        The categorical label of each user-movie pair
-    valid_truths : mx.nd.NDArray
-        The actual rating values of each user-movie pair
-    test_enc_graph : dgl.DGLHeteroGraph
-        Encoder graph for test.
-    test_dec_graph : dgl.DGLHeteroGraph
-        Decoder graph for test.
-    test_labels : mx.nd.NDArray
-        The categorical label of each user-movie pair
-    test_truths : mx.nd.NDArray
-        The actual rating values of each user-movie pair
-    user_feature : mx.nd.NDArray
-        User feature tensor. If None, representing an identity matrix.
-    movie_feature : mx.nd.NDArray
-        Movie feature tensor. If None, representing an identity matrix.
-    possible_rating_values : np.ndarray
-        Available rating values in the dataset
-
-    Parameters
-    ----------
-    name : str
-        Dataset name. Could be "ml-100k", "ml-1m", "ml-10m"
-    ctx : mx.context.Context
-        Device context
-    use_one_hot_fea : bool, optional
-        If true, the ``user_feature`` attribute is None, representing an one-hot identity
-        matrix. (Default: False)
-    symm : bool, optional
-        If true, the use symmetric normalize constant. Otherwise, use left normalize
-        constant. (Default: True)
-    test_ratio : float, optional
-        Ratio of test data
-    valid_ratio : float, optional
-        Ratio of validation data
-
-    """
-    def __init__(self, name, ctx, use_one_hot_fea=False, symm=True,
-                 test_ratio=0.1, valid_ratio=0.1):
-        self._name = name
-        self._ctx = ctx
-        self._symm = symm
-        self._test_ratio = test_ratio
-        self._valid_ratio = valid_ratio
-        # download and extract
-        download_dir = get_download_dir()
-        zip_file_path = '{}/{}.zip'.format(download_dir, name)
-        download(_urls[name], path=zip_file_path)
-        extract_archive(zip_file_path, '{}/{}'.format(download_dir, name))
-        if name == 'ml-10m':
-            root_folder = 'ml-10M100K'
-        else:
-            root_folder = name
-        self._dir = os.path.join(download_dir, name, root_folder)
-        print("Starting processing {} ...".format(self._name))
-        self._load_raw_user_info()
-        self._load_raw_movie_info()
-        print('......')
-        if self._name == 'ml-100k':
-            self.all_train_rating_info = self._load_raw_rates(os.path.join(self._dir, 'u1.base'), '\t')
-            self.test_rating_info = self._load_raw_rates(os.path.join(self._dir, 'u1.test'), '\t')
-            self.all_rating_info = pd.concat([self.all_train_rating_info, self.test_rating_info])
-        elif self._name == 'ml-1m' or self._name == 'ml-10m':
-            self.all_rating_info = self._load_raw_rates(os.path.join(self._dir, 'ratings.dat'), '::')
-            num_test = int(np.ceil(self.all_rating_info.shape[0] * self._test_ratio))
-            shuffled_idx = np.random.permutation(self.all_rating_info.shape[0])
-            self.test_rating_info = self.all_rating_info.iloc[shuffled_idx[: num_test]]
-            self.all_train_rating_info = self.all_rating_info.iloc[shuffled_idx[num_test: ]]
-        else:
-            raise NotImplementedError
-        print('......')
-        num_valid = int(np.ceil(self.all_train_rating_info.shape[0] * self._valid_ratio))
-        shuffled_idx = np.random.permutation(self.all_train_rating_info.shape[0])
-        self.valid_rating_info = self.all_train_rating_info.iloc[shuffled_idx[: num_valid]]
-        self.train_rating_info = self.all_train_rating_info.iloc[shuffled_idx[num_valid: ]]
-        self.possible_rating_values = np.unique(self.train_rating_info["rating"].values)
-
-        print("All rating pairs : {}".format(self.all_rating_info.shape[0]))
-        print("\tAll train rating pairs : {}".format(self.all_train_rating_info.shape[0]))
-        print("\t\tTrain rating pairs : {}".format(self.train_rating_info.shape[0]))
-        print("\t\tValid rating pairs : {}".format(self.valid_rating_info.shape[0]))
-        print("\tTest rating pairs  : {}".format(self.test_rating_info.shape[0]))
-
-        self.user_info = self._drop_unseen_nodes(orign_info=self.user_info,
-                                                 cmp_col_name="id",
-                                                 reserved_ids_set=set(self.all_rating_info["user_id"].values),
-                                                 label="user")
-        self.movie_info = self._drop_unseen_nodes(orign_info=self.movie_info,
-                                                  cmp_col_name="id",
-                                                  reserved_ids_set=set(self.all_rating_info["movie_id"].values),
-                                                  label="movie")
-
-        # Map user/movie to the global id
-        self.global_user_id_map = {ele: i for i, ele in enumerate(self.user_info['id'])}
-        self.global_movie_id_map = {ele: i for i, ele in enumerate(self.movie_info['id'])}
-        print('Total user number = {}, movie number = {}'.format(len(self.global_user_id_map),
-                                                                 len(self.global_movie_id_map)))
-        self._num_user = len(self.global_user_id_map)
-        self._num_movie = len(self.global_movie_id_map)
-
-        ### Generate features
-        if use_one_hot_fea:
-            self.user_feature = None
-            self.movie_feature = None
-        else:
-            self.user_feature = mx.nd.array(self._process_user_fea(), ctx=ctx, dtype=np.float32)
-            self.movie_feature = mx.nd.array(self._process_movie_fea(), ctx=ctx, dtype=np.float32)
-        if self.user_feature is None:
-            self.user_feature_shape = (self.num_user, self.num_user)
-            self.movie_feature_shape = (self.num_movie, self.num_movie)
-        else:
-            self.user_feature_shape = self.user_feature.shape
-            self.movie_feature_shape = self.movie_feature.shape
-        info_line = "Feature dim: "
-        info_line += "\nuser: {}".format(self.user_feature_shape)
-        info_line += "\nmovie: {}".format(self.movie_feature_shape)
-        print(info_line)
-
-        all_train_rating_pairs, all_train_rating_values = self._generate_pair_value(self.all_train_rating_info)
-        train_rating_pairs, train_rating_values = self._generate_pair_value(self.train_rating_info)
-        valid_rating_pairs, valid_rating_values = self._generate_pair_value(self.valid_rating_info)
-        test_rating_pairs, test_rating_values = self._generate_pair_value(self.test_rating_info)
-
-        def _make_labels(ratings):
-            labels = mx.nd.array(np.searchsorted(self.possible_rating_values, ratings),
-                                 ctx=ctx, dtype=np.int32)
-            return labels
-
-        self.train_enc_graph = self._generate_enc_graph(train_rating_pairs, train_rating_values, add_support=True)
-        self.train_dec_graph = self._generate_dec_graph(train_rating_pairs)
-        self.train_labels = _make_labels(train_rating_values)
-        self.train_truths = mx.nd.array(train_rating_values, ctx=ctx, dtype=np.float32)
-
-        self.valid_enc_graph = self.train_enc_graph
-        self.valid_dec_graph = self._generate_dec_graph(valid_rating_pairs)
-        self.valid_labels = _make_labels(valid_rating_values)
-        self.valid_truths = mx.nd.array(valid_rating_values, ctx=ctx, dtype=np.float32)
-
-        self.test_enc_graph = self._generate_enc_graph(all_train_rating_pairs, all_train_rating_values, add_support=True)
-        self.test_dec_graph = self._generate_dec_graph(test_rating_pairs)
-        self.test_labels = _make_labels(test_rating_values)
-        self.test_truths = mx.nd.array(test_rating_values, ctx=ctx, dtype=np.float32)
-
-        def _npairs(graph):
-            rst = 0
-            for r in self.possible_rating_values:
-                rst += graph.number_of_edges(str(r))
-            return rst
-
-        print("Train enc graph: \t#user:{}\t#movie:{}\t#pairs:{}".format(
-            self.train_enc_graph.number_of_nodes('user'), self.train_enc_graph.number_of_nodes('movie'),
-            _npairs(self.train_enc_graph)))
-        print("Train dec graph: \t#user:{}\t#movie:{}\t#pairs:{}".format(
-            self.train_dec_graph.number_of_nodes('user'), self.train_dec_graph.number_of_nodes('movie'),
-            self.train_dec_graph.number_of_edges()))
-        print("Valid enc graph: \t#user:{}\t#movie:{}\t#pairs:{}".format(
-            self.valid_enc_graph.number_of_nodes('user'), self.valid_enc_graph.number_of_nodes('movie'),
-            _npairs(self.valid_enc_graph)))
-        print("Valid dec graph: \t#user:{}\t#movie:{}\t#pairs:{}".format(
-            self.valid_dec_graph.number_of_nodes('user'), self.valid_dec_graph.number_of_nodes('movie'),
-            self.valid_dec_graph.number_of_edges()))
-        print("Test enc graph: \t#user:{}\t#movie:{}\t#pairs:{}".format(
-            self.test_enc_graph.number_of_nodes('user'), self.test_enc_graph.number_of_nodes('movie'),
-            _npairs(self.test_enc_graph)))
-        print("Test dec graph: \t#user:{}\t#movie:{}\t#pairs:{}".format(
-            self.test_dec_graph.number_of_nodes('user'), self.test_dec_graph.number_of_nodes('movie'),
-            self.test_dec_graph.number_of_edges()))
-
-    def _generate_pair_value(self, rating_info):
-        rating_pairs = (np.array([self.global_user_id_map[ele] for ele in rating_info["user_id"]],
-                                 dtype=np.int64),
-                        np.array([self.global_movie_id_map[ele] for ele in rating_info["movie_id"]],
-                                 dtype=np.int64))
-        rating_values = rating_info["rating"].values.astype(np.float32)
-        return rating_pairs, rating_values
-
-    def _generate_enc_graph(self, rating_pairs, rating_values, add_support=False):
-        user_movie_R = np.zeros((self._num_user, self._num_movie), dtype=np.float32)
-        user_movie_R[rating_pairs] = rating_values
-
-        data_dict = dict()
-        num_nodes_dict = {'user': self._num_user, 'movie': self._num_movie}
-        rating_row, rating_col = rating_pairs
-        for rating in self.possible_rating_values:
-            ridx = np.where(rating_values == rating)
-            rrow = rating_row[ridx]
-            rcol = rating_col[ridx]
-            data_dict.update({
-                ('user', str(rating), 'movie'): (rrow, rcol),
-                ('movie', 'rev-%s' % str(rating), 'user'): (rcol, rrow)
-            })
-        graph = dgl.heterograph(data_dict, num_nodes_dict=num_nodes_dict)
-
-        # sanity check
-        assert len(rating_pairs[0]) == sum([graph.number_of_edges(et) for et in graph.etypes]) // 2
-
-        if add_support:
-            def _calc_norm(x):
-                x = x.asnumpy().astype('float32')
-                x[x == 0.] = np.inf
-                x = mx.nd.array(1. / np.sqrt(x))
-                return x.expand_dims(1)
-            user_ci = []
-            user_cj = []
-            movie_ci = []
-            movie_cj = []
-            for r in self.possible_rating_values:
-                r = str(r)
-                user_ci.append(graph['rev-%s' % r].in_degrees())
-                movie_ci.append(graph[r].in_degrees())
-                if self._symm:
-                    user_cj.append(graph[r].out_degrees())
-                    movie_cj.append(graph['rev-%s' % r].out_degrees())
-                else:
-                    user_cj.append(mx.nd.zeros((self.num_user,)))
-                    movie_cj.append(mx.nd.zeros((self.num_movie,)))
-            user_ci = _calc_norm(mx.nd.add_n(*user_ci))
-            movie_ci = _calc_norm(mx.nd.add_n(*movie_ci))
-            if self._symm:
-                user_cj = _calc_norm(mx.nd.add_n(*user_cj))
-                movie_cj = _calc_norm(mx.nd.add_n(*movie_cj))
-            else:
-                user_cj = mx.nd.ones((self.num_user,))
-                movie_cj = mx.nd.ones((self.num_movie,))
-            graph.nodes['user'].data.update({'ci' : user_ci, 'cj' : user_cj})
-            graph.nodes['movie'].data.update({'ci' : movie_ci, 'cj' : movie_cj})
-
-        return graph
-
-    def _generate_dec_graph(self, rating_pairs):
-        ones = np.ones_like(rating_pairs[0])
-        user_movie_ratings_coo = sp.coo_matrix(
-            (ones, rating_pairs),
-            shape=(self.num_user, self.num_movie), dtype=np.float32)
-        g = dgl.bipartite_from_scipy(user_movie_ratings_coo, utype='_U', etype='_E', vtype='_V')
-        return dgl.heterograph({('user', 'rate', 'movie'): g.edges()}, 
-                               num_nodes_dict={'user': self.num_user, 'movie': self.num_movie})
-
-    @property
-    def num_links(self):
-        return self.possible_rating_values.size
-
-    @property
-    def num_user(self):
-        return self._num_user
-
-    @property
-    def num_movie(self):
-        return self._num_movie
-
-    def _drop_unseen_nodes(self, orign_info, cmp_col_name, reserved_ids_set, label):
-        # print("  -----------------")
-        # print("{}: {}(reserved) v.s. {}(from info)".format(label, len(reserved_ids_set),
-        #                                                      len(set(orign_info[cmp_col_name].values))))
-        if reserved_ids_set != set(orign_info[cmp_col_name].values):
-            pd_rating_ids = pd.DataFrame(list(reserved_ids_set), columns=["id_graph"])
-            # print("\torign_info: ({}, {})".format(orign_info.shape[0], orign_info.shape[1]))
-            data_info = orign_info.merge(pd_rating_ids, left_on=cmp_col_name, right_on='id_graph', how='outer')
-            data_info = data_info.dropna(subset=[cmp_col_name, 'id_graph'])
-            data_info = data_info.drop(columns=["id_graph"])
-            data_info = data_info.reset_index(drop=True)
-            # print("\tAfter dropping, data shape: ({}, {})".format(data_info.shape[0], data_info.shape[1]))
-            return data_info
-        else:
-            orign_info = orign_info.reset_index(drop=True)
-            return orign_info
-
-    def _load_raw_rates(self, file_path, sep):
-        """In MovieLens, the rates have the following format
-
-        ml-100k
-        user id \t movie id \t rating \t timestamp
-
-        ml-1m/10m
-        UserID::MovieID::Rating::Timestamp
-
-        timestamp is unix timestamp and can be converted by pd.to_datetime(X, unit='s')
-
-        Parameters
-        ----------
-        file_path : str
-
-        Returns
-        -------
-        rating_info : pd.DataFrame
-        """
-        rating_info = pd.read_csv(
-            file_path, sep=sep, header=None,
-            names=['user_id', 'movie_id', 'rating', 'timestamp'],
-            dtype={'user_id': np.int32, 'movie_id' : np.int32,
-                   'ratings': np.float32, 'timestamp': np.int64}, engine='python')
-        return rating_info
-
-    def _load_raw_user_info(self):
-        """In MovieLens, the user attributes file have the following formats:
-
-        ml-100k:
-        user id | age | gender | occupation | zip code
-
-        ml-1m:
-        UserID::Gender::Age::Occupation::Zip-code
-
-        For ml-10m, there is no user information. We read the user id from the rating file.
-
-        Parameters
-        ----------
-        name : str
-
-        Returns
-        -------
-        user_info : pd.DataFrame
-        """
-        if self._name == 'ml-100k':
-            self.user_info = pd.read_csv(os.path.join(self._dir, 'u.user'), sep='|', header=None,
-                                    names=['id', 'age', 'gender', 'occupation', 'zip_code'], engine='python')
-        elif self._name == 'ml-1m':
-            self.user_info = pd.read_csv(os.path.join(self._dir, 'users.dat'), sep='::', header=None,
-                                    names=['id', 'gender', 'age', 'occupation', 'zip_code'], engine='python')
-        elif self._name == 'ml-10m':
-            rating_info = pd.read_csv(
-                os.path.join(self._dir, 'ratings.dat'), sep='::', header=None,
-                names=['user_id', 'movie_id', 'rating', 'timestamp'],
-                dtype={'user_id': np.int32, 'movie_id': np.int32, 'ratings': np.float32,
-                       'timestamp': np.int64}, engine='python')
-            self.user_info = pd.DataFrame(np.unique(rating_info['user_id'].values.astype(np.int32)),
-                                     columns=['id'])
-        else:
-            raise NotImplementedError
-
-    def _process_user_fea(self):
-        """
-
-        Parameters
-        ----------
-        user_info : pd.DataFrame
-        name : str
-        For ml-100k and ml-1m, the column name is ['id', 'gender', 'age', 'occupation', 'zip_code'].
-            We take the age, gender, and the one-hot encoding of the occupation as the user features.
-        For ml-10m, there is no user feature and we set the feature to be a single zero.
-
-        Returns
-        -------
-        user_features : np.ndarray
-
-        """
-        if self._name == 'ml-100k' or self._name == 'ml-1m':
-            ages = self.user_info['age'].values.astype(np.float32)
-            gender = (self.user_info['gender'] == 'F').values.astype(np.float32)
-            all_occupations = set(self.user_info['occupation'])
-            occupation_map = {ele: i for i, ele in enumerate(all_occupations)}
-            occupation_one_hot = np.zeros(shape=(self.user_info.shape[0], len(all_occupations)),
-                                          dtype=np.float32)
-            occupation_one_hot[np.arange(self.user_info.shape[0]),
-                               np.array([occupation_map[ele] for ele in self.user_info['occupation']])] = 1
-            user_features = np.concatenate([ages.reshape((self.user_info.shape[0], 1)) / 50.0,
-                                            gender.reshape((self.user_info.shape[0], 1)),
-                                            occupation_one_hot], axis=1)
-        elif self._name == 'ml-10m':
-            user_features = np.zeros(shape=(self.user_info.shape[0], 1), dtype=np.float32)
-        else:
-            raise NotImplementedError
-        return user_features
-
-    def _load_raw_movie_info(self):
-        """In MovieLens, the movie attributes may have the following formats:
-
-        In ml_100k:
-
-        movie id | movie title | release date | video release date | IMDb URL | [genres]
-
-        In ml_1m, ml_10m:
-
-        MovieID::Title (Release Year)::Genres
-
-        Also, Genres are separated by |, e.g., Adventure|Animation|Children|Comedy|Fantasy
-
-        Parameters
-        ----------
-        name : str
-
-        Returns
-        -------
-        movie_info : pd.DataFrame
-            For ml-100k, the column name is ['id', 'title', 'release_date', 'video_release_date', 'url'] + [GENRES (19)]]
-            For ml-1m and ml-10m, the column name is ['id', 'title'] + [GENRES (18/20)]]
-        """
-        if self._name == 'ml-100k':
-            GENRES = GENRES_ML_100K
-        elif self._name == 'ml-1m':
-            GENRES = GENRES_ML_1M
-        elif self._name == 'ml-10m':
-            GENRES = GENRES_ML_10M
-        else:
-            raise NotImplementedError
-
-        if self._name == 'ml-100k':
-            file_path = os.path.join(self._dir, 'u.item')
-            self.movie_info = pd.read_csv(file_path, sep='|', header=None,
-                                          names=['id', 'title', 'release_date', 'video_release_date', 'url'] + GENRES,
-                                          engine='python')
-        elif self._name == 'ml-1m' or self._name == 'ml-10m':
-            file_path = os.path.join(self._dir, 'movies.dat')
-            movie_info = pd.read_csv(file_path, sep='::', header=None,
-                                     names=['id', 'title', 'genres'], engine='python')
-            genre_map = {ele: i for i, ele in enumerate(GENRES)}
-            genre_map['Children\'s'] = genre_map['Children']
-            genre_map['Childrens'] = genre_map['Children']
-            movie_genres = np.zeros(shape=(movie_info.shape[0], len(GENRES)), dtype=np.float32)
-            for i, genres in enumerate(movie_info['genres']):
-                for ele in genres.split('|'):
-                    if ele in genre_map:
-                        movie_genres[i, genre_map[ele]] = 1.0
-                    else:
-                        print('genres not found, filled with unknown: {}'.format(genres))
-                        movie_genres[i, genre_map['unknown']] = 1.0
-            for idx, genre_name in enumerate(GENRES):
-                assert idx == genre_map[genre_name]
-                movie_info[genre_name] = movie_genres[:, idx]
-            self.movie_info = movie_info.drop(columns=["genres"])
-        else:
-            raise NotImplementedError
-
-    def _process_movie_fea(self):
-        """
-
-        Parameters
-        ----------
-        movie_info : pd.DataFrame
-        name :  str
-
-        Returns
-        -------
-        movie_features : np.ndarray
-            Generate movie features by concatenating embedding and the year
-
-        """
-        if self._name == 'ml-100k':
-            GENRES = GENRES_ML_100K
-        elif self._name == 'ml-1m':
-            GENRES = GENRES_ML_1M
-        elif self._name == 'ml-10m':
-            GENRES = GENRES_ML_10M
-        else:
-            raise NotImplementedError
-
-        word_embedding = nlp.embedding.GloVe('glove.840B.300d')
-        tokenizer = nlp.data.transforms.SpacyTokenizer()
-
-        title_embedding = np.zeros(shape=(self.movie_info.shape[0], 300), dtype=np.float32)
-        release_years = np.zeros(shape=(self.movie_info.shape[0], 1), dtype=np.float32)
-        p = re.compile(r'(.+)\s*\((\d+)\)')
-        for i, title in enumerate(self.movie_info['title']):
-            match_res = p.match(title)
-            if match_res is None:
-                print('{} cannot be matched, index={}, name={}'.format(title, i, self._name))
-                title_context, year = title, 1950
-            else:
-                title_context, year = match_res.groups()
-            # We use average of glove
-            title_embedding[i, :] = word_embedding[tokenizer(title_context)].asnumpy().mean(axis=0)
-            release_years[i] = float(year)
-        movie_features = np.concatenate((title_embedding,
-                                         (release_years - 1950.0) / 100.0,
-                                         self.movie_info[GENRES]),
-                                        axis=1)
-        return movie_features
-
-if __name__ == '__main__':
-    MovieLens("ml-100k", ctx=mx.cpu(), symm=True)
--- a/examples/mxnet/_deprecated/gcmc/model.py
+++ b/examples/mxnet/_deprecated/gcmc/model.py
-"""NN modules"""
-import math
-
-import numpy as np
-import mxnet as mx
-import mxnet.ndarray as F
-from mxnet.gluon import nn, Block
-import dgl.function as fn
-
-from utils import get_activation
-
-class GCMCLayer(Block):
-    r"""GCMC layer
-
-    .. math::
-        z_j^{(l+1)} = \sigma_{agg}\left[\mathrm{agg}\left(
-        \sum_{j\in\mathcal{N}_1}\frac{1}{c_{ij}}W_1h_j, \ldots,
-        \sum_{j\in\mathcal{N}_R}\frac{1}{c_{ij}}W_Rh_j
-        \right)\right]
-
-    After that, apply an extra output projection:
-
-    .. math::
-        h_j^{(l+1)} = \sigma_{out}W_oz_j^{(l+1)}
-
-    The equation is applied to both user nodes and movie nodes and the parameters
-    are not shared unless ``share_user_item_param`` is true.
-
-    Parameters
-    ----------
-    rating_vals : list of int or float
-        Possible rating values.
-    user_in_units : int
-        Size of user input feature
-    movie_in_units : int
-        Size of movie input feature
-    msg_units : int
-        Size of message :math:`W_rh_j`
-    out_units : int
-        Size of of final output user and movie features
-    dropout_rate : float, optional
-        Dropout rate (Default: 0.0)
-    agg : str, optional
-        Function to aggregate messages of different ratings.
-        Could be any of the supported cross type reducers:
-        "sum", "max", "min", "mean", "stack".
-        (Default: "stack")
-    agg_act : callable, str, optional
-        Activation function :math:`sigma_{agg}`. (Default: None)
-    out_act : callable, str, optional
-        Activation function :math:`sigma_{agg}`. (Default: None)
-    share_user_item_param : bool, optional
-        If true, user node and movie node share the same set of parameters.
-        Require ``user_in_units`` and ``move_in_units`` to be the same.
-        (Default: False)
-    """
-    def __init__(self,
-                 rating_vals,
-                 user_in_units,
-                 movie_in_units,
-                 msg_units,
-                 out_units,
-                 dropout_rate=0.0,
-                 agg='stack',  # or 'sum'
-                 agg_act=None,
-                 out_act=None,
-                 share_user_item_param=False):
-        super(GCMCLayer, self).__init__()
-        self.rating_vals = rating_vals
-        self.agg = agg
-        self.share_user_item_param = share_user_item_param
-        if agg == 'stack':
-            # divide the original msg unit size by number of ratings to keep
-            # the dimensionality
-            assert msg_units % len(rating_vals) == 0
-            msg_units = msg_units // len(rating_vals)
-        with self.name_scope():
-            self.dropout = nn.Dropout(dropout_rate)
-            self.W_r = {}
-            for rating in rating_vals:
-                rating = str(rating)
-                if share_user_item_param and user_in_units == movie_in_units:
-                    self.W_r[rating] = self.params.get(
-                        'W_r_%s' % rating, shape=(user_in_units, msg_units),
-                        dtype=np.float32, allow_deferred_init=True)
-                    self.W_r['rev-%s' % rating] = self.W_r[rating]
-                else:
-                    self.W_r[rating] = self.params.get(
-                        'W_r_%s' % rating, shape=(user_in_units, msg_units),
-                        dtype=np.float32, allow_deferred_init=True)
-                    self.W_r['rev-%s' % rating] = self.params.get(
-                        'revW_r_%s' % rating, shape=(movie_in_units, msg_units),
-                        dtype=np.float32, allow_deferred_init=True)
-            self.ufc = nn.Dense(out_units)
-            if share_user_item_param:
-                self.ifc = self.ufc
-            else:
-                self.ifc = nn.Dense(out_units)
-            self.agg_act = get_activation(agg_act)
-            self.out_act = get_activation(out_act)
-
-    def forward(self, graph, ufeat=None, ifeat=None):
-        """Forward function
-
-        Normalizer constant :math:`c_{ij}` is stored as two node data "ci"
-        and "cj".
-
-        Parameters
-        ----------
-        graph : DGLHeteroGraph
-            User-movie rating graph. It should contain two node types: "user"
-            and "movie" and many edge types each for one rating value.
-        ufeat : mx.nd.NDArray, optional
-            User features. If None, using an identity matrix.
-        ifeat : mx.nd.NDArray, optional
-            Movie features. If None, using an identity matrix.
-
-        Returns
-        -------
-        new_ufeat : mx.nd.NDArray
-            New user features
-        new_ifeat : mx.nd.NDArray
-            New movie features
-        """
-        num_u = graph.number_of_nodes('user')
-        num_i = graph.number_of_nodes('movie')
-        funcs = {}
-        for i, rating in enumerate(self.rating_vals):
-            rating = str(rating)
-            # W_r * x
-            x_u = dot_or_identity(ufeat, self.W_r[rating].data())
-            x_i = dot_or_identity(ifeat, self.W_r['rev-%s' % rating].data())
-            # left norm and dropout
-            x_u = x_u * self.dropout(graph.nodes['user'].data['cj'])
-            x_i = x_i * self.dropout(graph.nodes['movie'].data['cj'])
-            graph.nodes['user'].data['h%d' % i] = x_u
-            graph.nodes['movie'].data['h%d' % i] = x_i
-            funcs[rating] = (fn.copy_u('h%d' % i, 'm'), fn.sum('m', 'h'))
-            funcs['rev-%s' % rating] = (fn.copy_u('h%d' % i, 'm'), fn.sum('m', 'h'))
-        # message passing
-        graph.multi_update_all(funcs, self.agg)
-        ufeat = graph.nodes['user'].data.pop('h').reshape((num_u, -1))
-        ifeat = graph.nodes['movie'].data.pop('h').reshape((num_i, -1))
-        # right norm
-        ufeat = ufeat * graph.nodes['user'].data['ci']
-        ifeat = ifeat * graph.nodes['movie'].data['ci']
-        # fc and non-linear
-        ufeat = self.agg_act(ufeat)
-        ifeat = self.agg_act(ifeat)
-        ufeat = self.dropout(ufeat)
-        ifeat = self.dropout(ifeat)
-        ufeat = self.ufc(ufeat)
-        ifeat = self.ifc(ifeat)
-        return self.out_act(ufeat), self.out_act(ifeat)
-
-class BiDecoder(Block):
-    r"""Bilinear decoder.
-
-    .. math::
-        p(M_{ij}=r) = \text{softmax}(u_i^TQ_rv_j)
-
-    The trainable parameter :math:`Q_r` is further decomposed to a linear
-    combination of basis weight matrices :math:`P_s`:
-
-    .. math::
-        Q_r = \sum_{s=1}^{b} a_{rs}P_s
-
-    Parameters
-    ----------
-    rating_vals : list of int or float
-        Possible rating values.
-    in_units : int
-        Size of input user and movie features
-    num_basis_functions : int, optional
-        Number of basis. (Default: 2)
-    dropout_rate : float, optional
-        Dropout raite (Default: 0.0)
-    """
-    def __init__(self,
-                 rating_vals,
-                 in_units,
-                 num_basis_functions=2,
-                 dropout_rate=0.0):
-        super(BiDecoder, self).__init__()
-        self.rating_vals = rating_vals
-        self._num_basis_functions = num_basis_functions
-        self.dropout = nn.Dropout(dropout_rate)
-        self.Ps = []
-        with self.name_scope():
-            for i in range(num_basis_functions):
-                self.Ps.append(self.params.get(
-                    'Ps_%d' % i, shape=(in_units, in_units),
-                    #init=mx.initializer.Orthogonal(scale=1.1, rand_type='normal'),
-                    init=mx.initializer.Xavier(magnitude=math.sqrt(2.0)),
-                    allow_deferred_init=True))
-            self.rate_out = nn.Dense(units=len(rating_vals), flatten=False, use_bias=False)
-
-    def forward(self, graph, ufeat, ifeat):
-        """Forward function.
-
-        Parameters
-        ----------
-        graph : DGLHeteroGraph
-            "Flattened" user-movie graph with only one edge type.
-        ufeat : mx.nd.NDArray
-            User embeddings. Shape: (|V_u|, D)
-        ifeat : mx.nd.NDArray
-            Movie embeddings. Shape: (|V_m|, D)
-
-        Returns
-        -------
-        mx.nd.NDArray
-            Predicting scores for each user-movie edge.
-        """
-        graph = graph.local_var()
-        ufeat = self.dropout(ufeat)
-        ifeat = self.dropout(ifeat)
-        graph.nodes['movie'].data['h'] = ifeat
-        basis_out = []
-        for i in range(self._num_basis_functions):
-            graph.nodes['user'].data['h'] = F.dot(ufeat, self.Ps[i].data())
-            graph.apply_edges(fn.u_dot_v('h', 'h', 'sr'))
-            basis_out.append(graph.edata['sr'])
-        out = F.concat(*basis_out, dim=1)
-        out = self.rate_out(out)
-        return out
-
-def dot_or_identity(A, B):
-    # if A is None, treat as identity matrix
-    if A is None:
-        return B
-    else:
-        return mx.nd.dot(A, B)
--- a/examples/mxnet/_deprecated/gcmc/train.py
+++ b/examples/mxnet/_deprecated/gcmc/train.py
-"""Training script"""
-import os, time
-import argparse
-import logging
-import random
-import string
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-from data import MovieLens
-from model import GCMCLayer, BiDecoder
-from utils import get_activation, parse_ctx, gluon_net_info, gluon_total_param_num, \
-                  params_clip_global_norm, MetricLogger
-from mxnet.gluon import Block
-
-class Net(Block):
-    def __init__(self, args, **kwargs):
-        super(Net, self).__init__(**kwargs)
-        self._act = get_activation(args.model_activation)
-        with self.name_scope():
-            self.encoder = GCMCLayer(args.rating_vals,
-                                     args.src_in_units,
-                                     args.dst_in_units,
-                                     args.gcn_agg_units,
-                                     args.gcn_out_units,
-                                     args.gcn_dropout,
-                                     args.gcn_agg_accum,
-                                     agg_act=self._act,
-                                     share_user_item_param=args.share_param)
-            self.decoder = BiDecoder(args.rating_vals,
-                                     in_units=args.gcn_out_units,
-                                     num_basis_functions=args.gen_r_num_basis_func)
-
-    def forward(self, enc_graph, dec_graph, ufeat, ifeat):
-        user_out, movie_out = self.encoder(
-            enc_graph,
-            ufeat,
-            ifeat)
-        pred_ratings = self.decoder(dec_graph, user_out, movie_out)
-        return pred_ratings
-
-def evaluate(args, net, dataset, segment='valid'):
-    possible_rating_values = dataset.possible_rating_values
-    nd_possible_rating_values = mx.nd.array(possible_rating_values, ctx=args.ctx, dtype=np.float32)
-
-    if segment == "valid":
-        rating_values = dataset.valid_truths
-        enc_graph = dataset.valid_enc_graph
-        dec_graph = dataset.valid_dec_graph
-    elif segment == "test":
-        rating_values = dataset.test_truths
-        enc_graph = dataset.test_enc_graph
-        dec_graph = dataset.test_dec_graph
-    else:
-        raise NotImplementedError
-
-    # Evaluate RMSE
-    with mx.autograd.predict_mode():
-        pred_ratings = net(enc_graph, dec_graph,
-                           dataset.user_feature, dataset.movie_feature)
-    real_pred_ratings = (mx.nd.softmax(pred_ratings, axis=1) *
-                         nd_possible_rating_values.reshape((1, -1))).sum(axis=1)
-    rmse = mx.nd.square(real_pred_ratings - rating_values).mean().asscalar()
-    rmse = np.sqrt(rmse)
-    return rmse
-
-def train(args):
-    print(args)
-    dataset = MovieLens(args.data_name, args.ctx, use_one_hot_fea=args.use_one_hot_fea, symm=args.gcn_agg_norm_symm,
-                        test_ratio=args.data_test_ratio, valid_ratio=args.data_valid_ratio)
-    print("Loading data finished ...\n")
-
-    args.src_in_units = dataset.user_feature_shape[1]
-    args.dst_in_units = dataset.movie_feature_shape[1]
-    args.rating_vals = dataset.possible_rating_values
-
-    ### build the net
-    net = Net(args=args)
-    net.initialize(init=mx.init.Xavier(factor_type='in'), ctx=args.ctx)
-    net.hybridize()
-    nd_possible_rating_values = mx.nd.array(dataset.possible_rating_values, ctx=args.ctx, dtype=np.float32)
-    rating_loss_net = gluon.loss.SoftmaxCELoss()
-    rating_loss_net.hybridize()
-    trainer = gluon.Trainer(net.collect_params(), args.train_optimizer, {'learning_rate': args.train_lr})
-    print("Loading network finished ...\n")
-
-    ### perpare training data
-    train_gt_labels = dataset.train_labels
-    train_gt_ratings = dataset.train_truths
-
-    ### prepare the logger
-    train_loss_logger = MetricLogger(['iter', 'loss', 'rmse'], ['%d', '%.4f', '%.4f'],
-                                     os.path.join(args.save_dir, 'train_loss%d.csv' % args.save_id))
-    valid_loss_logger = MetricLogger(['iter', 'rmse'], ['%d', '%.4f'],
-                                     os.path.join(args.save_dir, 'valid_loss%d.csv' % args.save_id))
-    test_loss_logger = MetricLogger(['iter', 'rmse'], ['%d', '%.4f'],
-                                    os.path.join(args.save_dir, 'test_loss%d.csv' % args.save_id))
-
-    ### declare the loss information
-    best_valid_rmse = np.inf
-    no_better_valid = 0
-    best_iter = -1
-    avg_gnorm = 0
-    count_rmse = 0
-    count_num = 0
-    count_loss = 0
-
-    dataset.train_enc_graph = dataset.train_enc_graph.to(args.ctx)
-    dataset.train_dec_graph = dataset.train_dec_graph.to(args.ctx)
-    dataset.valid_enc_graph = dataset.train_enc_graph
-    dataset.valid_dec_graph = dataset.valid_dec_graph.to(args.ctx)
-    dataset.test_enc_graph = dataset.test_enc_graph.to(args.ctx)
-    dataset.test_dec_graph = dataset.test_dec_graph.to(args.ctx)
-
-    print("Start training ...")
-    dur = []
-    for iter_idx in range(1, args.train_max_iter):
-        if iter_idx > 3:
-            t0 = time.time()
-        with mx.autograd.record():
-            pred_ratings = net(dataset.train_enc_graph, dataset.train_dec_graph,
-                               dataset.user_feature, dataset.movie_feature)
-            loss = rating_loss_net(pred_ratings, train_gt_labels).mean()
-            loss.backward()
-
-        count_loss += loss.asscalar()
-        gnorm = params_clip_global_norm(net.collect_params(), args.train_grad_clip, args.ctx)
-        avg_gnorm += gnorm
-        trainer.step(1.0)
-        if iter_idx > 3:
-            dur.append(time.time() - t0)
-
-        if iter_idx == 1:
-            print("Total #Param of net: %d" % (gluon_total_param_num(net)))
-            print(gluon_net_info(net, save_path=os.path.join(args.save_dir, 'net%d.txt' % args.save_id)))
-
-        real_pred_ratings = (mx.nd.softmax(pred_ratings, axis=1) *
-                             nd_possible_rating_values.reshape((1, -1))).sum(axis=1)
-        rmse = mx.nd.square(real_pred_ratings - train_gt_ratings).sum()
-        count_rmse += rmse.asscalar()
-        count_num += pred_ratings.shape[0]
-
-        if iter_idx % args.train_log_interval == 0:
-            train_loss_logger.log(iter=iter_idx,
-                                  loss=count_loss/(iter_idx+1), rmse=count_rmse/count_num)
-            logging_str = "Iter={}, gnorm={:.3f}, loss={:.4f}, rmse={:.4f}, time={:.4f}".format(
-                iter_idx, avg_gnorm/args.train_log_interval,
-                count_loss/iter_idx, count_rmse/count_num,
-                np.average(dur))
-            avg_gnorm = 0
-            count_rmse = 0
-            count_num = 0
-
-        if iter_idx % args.train_valid_interval == 0:
-            valid_rmse = evaluate(args=args, net=net, dataset=dataset, segment='valid')
-            valid_loss_logger.log(iter = iter_idx, rmse = valid_rmse)
-            logging_str += ',\tVal RMSE={:.4f}'.format(valid_rmse)
-
-            if valid_rmse < best_valid_rmse:
-                best_valid_rmse = valid_rmse
-                no_better_valid = 0
-                best_iter = iter_idx
-                #net.save_parameters(filename=os.path.join(args.save_dir, 'best_valid_net{}.params'.format(args.save_id)))
-                test_rmse = evaluate(args=args, net=net, dataset=dataset, segment='test')
-                best_test_rmse = test_rmse
-                test_loss_logger.log(iter=iter_idx, rmse=test_rmse)
-                logging_str += ', Test RMSE={:.4f}'.format(test_rmse)
-            else:
-                no_better_valid += 1
-                if no_better_valid > args.train_early_stopping_patience\
-                    and trainer.learning_rate <= args.train_min_lr:
-                    logging.info("Early stopping threshold reached. Stop training.")
-                    break
-                if no_better_valid > args.train_decay_patience:
-                    new_lr = max(trainer.learning_rate * args.train_lr_decay_factor, args.train_min_lr)
-                    if new_lr < trainer.learning_rate:
-                        logging.info("\tChange the LR to %g" % new_lr)
-                        trainer.set_learning_rate(new_lr)
-                        no_better_valid = 0
-        if iter_idx  % args.train_log_interval == 0:
-            print(logging_str)
-    print('Best Iter Idx={}, Best Valid RMSE={:.4f}, Best Test RMSE={:.4f}'.format(
-        best_iter, best_valid_rmse, best_test_rmse))
-    train_loss_logger.close()
-    valid_loss_logger.close()
-    test_loss_logger.close()
-
-
-def config():
-    parser = argparse.ArgumentParser(description='Run the baseline method.')
-
-    parser.add_argument('--seed', default=123, type=int)
-    parser.add_argument('--ctx', dest='ctx', default='gpu0', type=str,
-                        help='Running Context. E.g `--ctx gpu` or `--ctx gpu0,gpu1` or `--ctx cpu`')
-    parser.add_argument('--save_dir', type=str, help='The saving directory')
-    parser.add_argument('--save_id', type=int, help='The saving log id')
-    parser.add_argument('--silent', action='store_true')
-
-    parser.add_argument('--data_name', default='ml-1m', type=str,
-                        help='The dataset name: ml-100k, ml-1m, ml-10m')
-    parser.add_argument('--data_test_ratio', type=float, default=0.1) ## for ml-100k the test ration is 0.2
-    parser.add_argument('--data_valid_ratio', type=float, default=0.1)
-    parser.add_argument('--use_one_hot_fea', action='store_true', default=False)
-
-    #parser.add_argument('--model_remove_rating', type=bool, default=False)
-    parser.add_argument('--model_activation', type=str, default="leaky")
-
-    parser.add_argument('--gcn_dropout', type=float, default=0.7)
-    parser.add_argument('--gcn_agg_norm_symm', type=bool, default=True)
-    parser.add_argument('--gcn_agg_units', type=int, default=500)
-    parser.add_argument('--gcn_agg_accum', type=str, default="sum")
-    parser.add_argument('--gcn_out_units', type=int, default=75)
-
-    parser.add_argument('--gen_r_num_basis_func', type=int, default=2)
-
-    # parser.add_argument('--train_rating_batch_size', type=int, default=10000)
-    parser.add_argument('--train_max_iter', type=int, default=2000)
-    parser.add_argument('--train_log_interval', type=int, default=1)
-    parser.add_argument('--train_valid_interval', type=int, default=1)
-    parser.add_argument('--train_optimizer', type=str, default="adam")
-    parser.add_argument('--train_grad_clip', type=float, default=1.0)
-    parser.add_argument('--train_lr', type=float, default=0.01)
-    parser.add_argument('--train_min_lr', type=float, default=0.001)
-    parser.add_argument('--train_lr_decay_factor', type=float, default=0.5)
-    parser.add_argument('--train_decay_patience', type=int, default=50)
-    parser.add_argument('--train_early_stopping_patience', type=int, default=100)
-    parser.add_argument('--share_param', default=False, action='store_true')
-
-    args = parser.parse_args()
-    args.ctx = parse_ctx(args.ctx)[0]
-
-
-    ### configure save_fir to save all the info
-    if args.save_dir is None:
-        args.save_dir = args.data_name+"_" + ''.join(random.choices(string.ascii_uppercase + string.digits, k=2))
-    if args.save_id is None:
-        args.save_id = np.random.randint(20)
-    args.save_dir = os.path.join("log", args.save_dir)
-    if not os.path.isdir(args.save_dir):
-        os.makedirs(args.save_dir)
-
-    return args
-
-
-if __name__ == '__main__':
-    args = config()
-    np.random.seed(args.seed)
-    mx.random.seed(args.seed, args.ctx)
-    train(args)
--- a/examples/mxnet/_deprecated/gcmc/utils.py
+++ b/examples/mxnet/_deprecated/gcmc/utils.py
-import ast
-import os
-import csv
-import inspect
-import logging
-import re
-import mxnet.ndarray as nd
-from mxnet import gluon
-from mxnet.gluon import nn
-import mxnet as mx
-import numpy as np
-from collections import OrderedDict
-
-class MetricLogger(object):
-    def __init__(self, attr_names, parse_formats, save_path):
-        self._attr_format_dict = OrderedDict(zip(attr_names, parse_formats))
-        self._file = open(save_path, 'w')
-        self._csv = csv.writer(self._file)
-        self._csv.writerow(attr_names)
-        self._file.flush()
-
-    def log(self, **kwargs):
-        self._csv.writerow([parse_format % kwargs[attr_name]
-                            for attr_name, parse_format in self._attr_format_dict.items()])
-        self._file.flush()
-
-    def close(self):
-        self._file.close()
-
-def parse_ctx(ctx_args):
-    ctx = re.findall('([a-z]+)(\d*)', ctx_args)
-    ctx = [(device, int(num)) if len(num) > 0 else (device, 0) for device, num in ctx]
-    ctx = [mx.Context(*ele) for ele in ctx]
-    return ctx
-
-
-def gluon_total_param_num(net):
-    return sum([np.prod(v.shape) for v in net.collect_params().values()])
-
-
-def gluon_net_info(net, save_path=None):
-    info_str = 'Total Param Number: {}\n'.format(gluon_total_param_num(net)) +\
-               'Params:\n'
-    for k, v in net.collect_params().items():
-        info_str += '\t{}: {}, {}\n'.format(k, v.shape, np.prod(v.shape))
-    info_str += str(net)
-    if save_path is not None:
-        with open(save_path, 'w') as f:
-            f.write(info_str)
-    return info_str
-
-
-def params_clip_global_norm(param_dict, clip, ctx):
-    grads = [p.grad(ctx) for p in param_dict.values()]
-    gnorm = gluon.utils.clip_global_norm(grads, clip)
-    return gnorm
-
-def get_activation(act):
-    """Get the activation based on the act string
-
-    Parameters
-    ----------
-    act: str or HybridBlock
-
-    Returns
-    -------
-    ret: HybridBlock
-    """
-    if act is None:
-        return lambda x: x
-    if isinstance(act, str):
-        if act == 'leaky':
-            return nn.LeakyReLU(0.1)
-        elif act in ['relu', 'sigmoid', 'tanh', 'softrelu', 'softsign']:
-            return nn.Activation(act)
-        else:
-            raise NotImplementedError
-    else:
-        return act
--- a/examples/mxnet/_deprecated/sampling/README.md
+++ b/examples/mxnet/_deprecated/sampling/README.md
-# Stochastic Training for Graph Convolutional Networks
-
-* Paper: [Control Variate](https://arxiv.org/abs/1710.10568)
-* Paper: [Skip Connection](https://arxiv.org/abs/1809.05343)
-* Author's code: [https://github.com/thu-ml/stochastic_gcn](https://github.com/thu-ml/stochastic_gcn)
-
-### Dependencies
-
- MXNet nightly build
-
-```bash
-pip install mxnet --pre
-```
-
-### Neighbor Sampling & Skip Connection
-cora: test accuracy ~83% with `--num-neighbors 2`, ~84% by training on the full graph
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_ns --dataset cora --self-loop --num-neighbors 2 --batch-size 1000 --test-batch-size 5000
-```
-
-citeseer: test accuracy ~69% with `--num-neighbors 2`, ~70% by training on the full graph
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_ns --dataset citeseer --self-loop --num-neighbors 2 --batch-size 1000 --test-batch-size 5000
-```
-
-pubmed: test accuracy ~78% with `--num-neighbors 3`, ~77% by training on the full graph
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_ns --dataset pubmed --self-loop --num-neighbors 3 --batch-size 1000 --test-batch-size 5000
-```
-
-reddit: test accuracy ~91% with `--num-neighbors 3` and `--batch-size 1000`, ~93% by training on the full graph
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_ns --dataset reddit-self-loop --num-neighbors 3 --batch-size 1000 --test-batch-size 5000 --n-hidden 64
-```
-
-
-### Control Variate & Skip Connection
-cora: test accuracy ~84% with `--num-neighbors 1`, ~84% by training on the full graph
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_cv --dataset cora --self-loop --num-neighbors 1 --batch-size 1000000 --test-batch-size 1000000
-```
-
-citeseer: test accuracy ~69% with `--num-neighbors 1`, ~70% by training on the full graph
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_cv --dataset citeseer --self-loop --num-neighbors 1 --batch-size 1000000 --test-batch-size 1000000
-```
-
-pubmed: test accuracy ~79% with `--num-neighbors 1`, ~77% by training on the full graph
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_cv --dataset pubmed --self-loop --num-neighbors 1 --batch-size 1000000 --test-batch-size 1000000
-```
-
-reddit: test accuracy ~93% with `--num-neighbors 1` and `--batch-size 1000`, ~93% by training on the full graph
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_cv --dataset reddit-self-loop --num-neighbors 1 --batch-size 10000 --test-batch-size 5000 --n-hidden 64
-```
-
-### Control Variate & GraphSAGE-mean
-
-Following [Control Variate](https://arxiv.org/abs/1710.10568), we use the mean pooling architecture GraphSAGE-mean, two linear layers and layer normalization per graph convolution layer.
-
-reddit: test accuracy 96.1% with `--num-neighbors 1` and `--batch-size 1000`, ~96.2% in [Control Variate](https://arxiv.org/abs/1710.10568) with `--num-neighbors 2` and `--batch-size 1000`
-```
-DGLBACKEND=mxnet python3 train.py --model graphsage_cv --batch-size 1000 --test-batch-size 5000 --n-epochs 50 --dataset reddit --num-neighbors 1 --n-hidden 128 --dropout 0.2 --weight-decay 0
-```
-
-### Run multi-processing training
-
-When training a GNN model with multiple processes, there are two steps.
-
-Step 1: run a graph store server separately that loads the reddit dataset with four workers.
-```
-python3 examples/mxnet/sampling/run_store_server.py --dataset reddit --num-workers 4
-```
-
-Step 2: run four workers to train GraphSage on the reddit dataset.
-```
-python3 ../incubator-mxnet/tools/launch.py -n 4 -s 1 --launcher local python3 multi_process_train.py --model graphsage_cv --batch-size 2500 --test-batch-size 5000 --n-epochs 1 --graph-name reddit --num-neighbors 1 --n-hidden 128 --dropout 0.2 --weight-decay 0
-```
--- a/examples/mxnet/_deprecated/sampling/dis_sampling/README.md
+++ b/examples/mxnet/_deprecated/sampling/dis_sampling/README.md
-
-# Stochastic Training for Graph Convolutional Networks Using Distributed Sampler
-
-* Paper: [Control Variate](https://arxiv.org/abs/1710.10568)
-* Paper: [Skip Connection](https://arxiv.org/abs/1809.05343)
-* Author's code: [https://github.com/thu-ml/stochastic_gcn](https://github.com/thu-ml/stochastic_gcn)
-
-### Dependencies
-
- MXNet nightly build
-
-```bash
-pip install mxnet --pre
-```
-
-### Usage Guide
-
-Assume that the user has already launched two instances (`instance_0` & `instance_1`) on AWS EC2, and also these two instances have the correct authority to access each other by TCP/IP protocol. Now we can treat `instance_0` as `Trainer` and `instance_1` as `Sampler`. Then, the user can start the trainer process and sampler process on these two instances separately. We have already provided a set of scripts to start the trainer and sampler process and users just need to change the `--ip` to their own IP address.
-
-Once we start the trainer process, users will see the following logging output:
-
-```
-[04:48:20] .../socket_communicator.cc:68: Bind to 127.0.0.1:2049
-[04:48:20] .../socket_communicator.cc:74: Listen on 127.0.0.1:2049, wait sender connect ...
-```
-
-After that user can start the sampler process. For the sampler instance_0, users can change the `--num-sampler` option to set the number of the sampler. The `sampler.py` script will start `--num-sampler` processes concurrently to maximalize the system utilization. Users can also launch many samplers in parallel across a set of machines. For example, if we have `10` sampler instance and for each instance, we set the `--num-sampler` to `2`, we need to set the `--num-sampler` of the trainer instance to `20`.
-
-### Neighbor Sampling & Skip Connection
-
-#### cora
-
-Test accuracy ~83% with `--num-neighbors 2`, ~84% by training on the full graph
-
-Trainer side:
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_ns --dataset cora --self-loop --num-neighbors 2 --batch-size 1000 --test-batch-size 5000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-Sampler side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model gcn_ns --dataset cora --self-loop --num-neighbors 2 --batch-size 1000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-#### citeseer 
-
-Test accuracy ~69% with `--num-neighbors 2`, ~70% by training on the full graph
-
-Trainer side:
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_ns --dataset citeseer --self-loop --num-neighbors 2 --batch-size 1000 --test-batch-size 5000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-Sampler side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model gcn_ns --dataset citeseer --self-loop --num-neighbors 2 --batch-size 1000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-#### pubmed
-
-Test accuracy ~78% with `--num-neighbors 3`, ~77% by training on the full graph
-
-Trainer side:
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_ns --dataset pubmed --self-loop --num-neighbors 3 --batch-size 1000 --test-batch-size 5000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-Sampler side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model gcn_ns --dataset pubmed --self-loop --num-neighbors 3 --batch-size 1000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-#### reddit
-
-Test accuracy ~91% with `--num-neighbors 2` and `--batch-size 1000`, ~93% by training on the full graph
-
-Trainer side:
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_ns --dataset reddit-self-loop --num-neighbors 2 --batch-size 1000 --test-batch-size 5000 --n-hidden 64 --ip 127.0.0.1:2049 --num-sampler 1
-```
-
-Sampler side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model gcn_ns --dataset reddit-self-loop --num-neighbors 2 --batch-size 1000 --ip 127.0.0.1:2049 --num-sampler 1
-```
-
-### Control Variate & Skip Connection
-
-#### cora
-
-Test accuracy ~84% with `--num-neighbors 1`, ~84% by training on the full graph
-
-Trainer side:
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_cv --dataset cora --self-loop --num-neighbors 1 --batch-size 1000000 --test-batch-size 1000000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-Sampler side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model gcn_cv --dataset cora --self-loop --num-neighbors 1 --batch-size 1000000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-#### citeseer
-
-Test accuracy ~69% with `--num-neighbors 1`, ~70% by training on the full graph
-
-Trainer Side:
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_cv --dataset citeseer --self-loop --num-neighbors 1 --batch-size 1000000 --test-batch-size 1000000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-Sampler Side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model gcn_cv --dataset citeseer --self-loop --num-neighbors 1 --batch-size 1000000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-#### pubmed
-
-Trainer Side:
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_cv --dataset pubmed --self-loop --num-neighbors 1 --batch-size 1000000 --test-batch-size 1000000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-Sampler Side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model gcn_cv --dataset pubmed --self-loop --num-neighbors 1 --batch-size 1000000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-#### reddit
-
-Test accuracy ~93% with `--num-neighbors 1` and `--batch-size 1000`, ~93% by training on the full graph
-
-Trainer Side:
-```
-DGLBACKEND=mxnet python3 train.py --model gcn_cv --dataset reddit-self-loop --num-neighbors 1 --batch-size 10000 --test-batch-size 5000 --n-hidden 64 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-Sampler Side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model gcn_cv --dataset reddit-self-loop --num-neighbors 1 --batch-size 10000 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-### Control Variate & GraphSAGE-mean
-
-Following [Control Variate](https://arxiv.org/abs/1710.10568), we use the mean pooling architecture GraphSAGE-mean, two linear layers and layer normalization per graph convolution layer.
-
-#### reddit
-
-Test accuracy 96.1% with `--num-neighbors 1` and `--batch-size 1000`, ~96.2% in [Control Variate](https://arxiv.org/abs/1710.10568) with `--num-neighbors 2` and `--batch-size 1000`
-
-Trainer side:
-```
-DGLBACKEND=mxnet python3 train.py --model graphsage_cv --batch-size 1000 --test-batch-size 5000 --n-epochs 50 --dataset reddit --num-neighbors 1 --n-hidden 128 --dropout 0.2 --weight-decay 0 --ip 127.0.0.1:50051 --num-sampler 1
-```
-
-Sampler side:
-```
-OMP_NUM_THREADS=1 DGLBACKEND=mxnet python3 sampler.py --model graphsage_cv --batch-size 1000 --dataset reddit --num-neighbors 1 --ip 127.0.0.1:50051 --num-sampler 1
-```
--- a/examples/mxnet/_deprecated/sampling/dis_sampling/dis_gcn_cv_sc.py
+++ b/examples/mxnet/_deprecated/sampling/dis_sampling/dis_gcn_cv_sc.py
-import os, sys
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-parentdir=os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
-sys.path.insert(0, parentdir)
-from gcn_cv_sc import NodeUpdate, GCNSampling, GCNInfer
-
-
-def gcn_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, distributed):
-    n0_feats = g.nodes[0].data['features']
-    num_nodes = g.number_of_nodes()
-    in_feats = n0_feats.shape[1]
-    g_ctx = n0_feats.context
-
-    norm = mx.nd.expand_dims(1./g.in_degrees().astype('float32'), 1)
-    g.set_n_repr({'norm': norm.as_in_context(g_ctx)})
-    degs = g.in_degrees().astype('float32').asnumpy()
-    degs[degs > args.num_neighbors] = args.num_neighbors
-    g.set_n_repr({'subg_norm': mx.nd.expand_dims(mx.nd.array(1./degs, ctx=g_ctx), 1)})
-    n_layers = args.n_layers
-
-    g.update_all(fn.copy_src(src='features', out='m'),
-                 fn.sum(msg='m', out='preprocess'),
-                 lambda node : {'preprocess': node.data['preprocess'] * node.data['norm']})
-    for i in range(n_layers - 1):
-        g.init_ndata('h_{}'.format(i), (num_nodes, args.n_hidden), 'float32')
-        g.init_ndata('agg_h_{}'.format(i), (num_nodes, args.n_hidden), 'float32')
-    g.init_ndata('h_{}'.format(n_layers-1), (num_nodes, 2*args.n_hidden), 'float32')
-    g.init_ndata('agg_h_{}'.format(n_layers-1), (num_nodes, 2*args.n_hidden), 'float32')
-
-    model = GCNSampling(in_feats,
-                        args.n_hidden,
-                        n_classes,
-                        n_layers,
-                        mx.nd.relu,
-                        args.dropout,
-                        prefix='GCN')
-
-    model.initialize(ctx=ctx)
-
-    loss_fcn = gluon.loss.SoftmaxCELoss()
-
-    infer_model = GCNInfer(in_feats,
-                           args.n_hidden,
-                           n_classes,
-                           n_layers,
-                           mx.nd.relu,
-                           prefix='GCN')
-
-    infer_model.initialize(ctx=ctx)
-
-    # use optimizer
-    print(model.collect_params())
-    kv_type = 'dist_sync' if distributed else 'local'
-    trainer = gluon.Trainer(model.collect_params(), 'adam',
-                            {'learning_rate': args.lr, 'wd': args.weight_decay},
-                            kvstore=mx.kv.create(kv_type))
-
-    # Create sampler receiver
-    sampler = dgl.contrib.sampling.SamplerReceiver(graph=g, addr=args.ip, num_sender=args.num_sampler)
-
-    # initialize graph
-    dur = []
-    adj = g.adjacency_matrix(transpose=False).as_in_context(g_ctx)
-    for epoch in range(args.n_epochs):
-        start = time.time()
-        if distributed:
-            msg_head = "Worker {:d}, epoch {:d}".format(g.worker_id, epoch)
-        else:
-            msg_head = "epoch {:d}".format(epoch)
-        for nf in sampler:
-            for i in range(n_layers):
-                agg_history_str = 'agg_h_{}'.format(i)
-                dests = nf.layer_parent_nid(i+1).as_in_context(g_ctx)
-                # TODO we could use DGLGraph.pull to implement this, but the current
-                # implementation of pull is very slow. Let's manually do it for now.
-                agg = mx.nd.dot(mx.nd.take(adj, dests), g.nodes[:].data['h_{}'.format(i)])
-                g.set_n_repr({agg_history_str: agg}, dests)
-
-            node_embed_names = [['preprocess', 'h_0']]
-            for i in range(1, n_layers):
-                node_embed_names.append(['h_{}'.format(i), 'agg_h_{}'.format(i-1), 'subg_norm', 'norm'])
-            node_embed_names.append(['agg_h_{}'.format(n_layers-1), 'subg_norm', 'norm'])
-
-            nf.copy_from_parent(node_embed_names=node_embed_names, ctx=ctx)
-            # forward
-            with mx.autograd.record():
-                pred = model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                loss = loss_fcn(pred, batch_labels)
-                loss = loss.sum() / len(batch_nids)
-
-            loss.backward()
-            trainer.step(batch_size=1)
-
-            node_embed_names = [['h_{}'.format(i)] for i in range(n_layers)]
-            node_embed_names.append([])
-
-            nf.copy_to_parent(node_embed_names=node_embed_names)
-        mx.nd.waitall()
-        print(msg_head + ': training takes ' + str(time.time() - start))
-
-        infer_params = infer_model.collect_params()
-
-        for key in infer_params:
-            idx = trainer._param2idx[key]
-            trainer._kvstore.pull(idx, out=infer_params[key].data())
-
-        num_acc = 0.
-        num_tests = 0
-
-        if not distributed or g.worker_id == 0:
-            for nf in dgl.contrib.sampling.NeighborSampler(g, args.test_batch_size,
-                                                           g.number_of_nodes(),
-                                                           neighbor_type='in',
-                                                           num_hops=n_layers,
-                                                           seed_nodes=test_nid):
-                node_embed_names = [['preprocess']]
-                for i in range(n_layers):
-                    node_embed_names.append(['norm'])
-
-                nf.copy_from_parent(node_embed_names=node_embed_names, ctx=ctx)
-                pred = infer_model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                num_acc += (pred.argmax(axis=1) == batch_labels).sum().asscalar()
-                num_tests += nf.layer_size(-1)
-                if distributed:
-                    g._sync_barrier()
-                print("Test Accuracy {:.4f}". format(num_acc/num_tests))
-                break
-        elif distributed:
-                g._sync_barrier()
--- a/examples/mxnet/_deprecated/sampling/dis_sampling/dis_gcn_ns_sc.py
+++ b/examples/mxnet/_deprecated/sampling/dis_sampling/dis_gcn_ns_sc.py
-import os, sys
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-from functools import partial
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-parentdir=os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
-sys.path.insert(0, parentdir)
-from gcn_ns_sc import NodeUpdate, GCNSampling, GCNInfer
-
-
-def gcn_ns_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples):
-    n0_feats = g.nodes[0].data['features']
-    in_feats = n0_feats.shape[1]
-    g_ctx = n0_feats.context
-
-    degs = g.in_degrees().astype('float32').as_in_context(g_ctx)
-    norm = mx.nd.expand_dims(1./degs, 1)
-    g.set_n_repr({'norm': norm})
-
-    model = GCNSampling(in_feats,
-                        args.n_hidden,
-                        n_classes,
-                        args.n_layers,
-                        mx.nd.relu,
-                        args.dropout,
-                        prefix='GCN')
-
-    model.initialize(ctx=ctx)
-    loss_fcn = gluon.loss.SoftmaxCELoss()
-
-    infer_model = GCNInfer(in_feats,
-                           args.n_hidden,
-                           n_classes,
-                           args.n_layers,
-                           mx.nd.relu,
-                           prefix='GCN')
-
-    infer_model.initialize(ctx=ctx)
-
-    # use optimizer
-    print(model.collect_params())
-    trainer = gluon.Trainer(model.collect_params(), 'adam',
-                            {'learning_rate': args.lr, 'wd': args.weight_decay},
-                            kvstore=mx.kv.create('local'))
-
-    # Create sampler receiver
-    sampler = dgl.contrib.sampling.SamplerReceiver(graph=g, addr=args.ip, num_sender=args.num_sampler)
-
-    # initialize graph
-    dur = []
-    for epoch in range(args.n_epochs):
-        for nf in sampler:
-            nf.copy_from_parent(ctx=ctx)
-            # forward
-            with mx.autograd.record():
-                pred = model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                loss = loss_fcn(pred, batch_labels)
-                loss = loss.sum() / len(batch_nids)
-
-            loss.backward()
-            trainer.step(batch_size=1)
-
-        infer_params = infer_model.collect_params()
-
-        for key in infer_params:
-            idx = trainer._param2idx[key]
-            trainer._kvstore.pull(idx, out=infer_params[key].data())
-
-        num_acc = 0.
-        num_tests = 0
-
-        for nf in dgl.contrib.sampling.NeighborSampler(g, args.test_batch_size,
-                                                       g.number_of_nodes(),
-                                                       neighbor_type='in',
-                                                       num_hops=args.n_layers+1,
-                                                       seed_nodes=test_nid):
-            nf.copy_from_parent(ctx=ctx)
-            pred = infer_model(nf)
-            batch_nids = nf.layer_parent_nid(-1)
-            batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-            num_acc += (pred.argmax(axis=1) == batch_labels).sum().asscalar()
-            num_tests += nf.layer_size(-1)
-            break
-
-        print("Test Accuracy {:.4f}". format(num_acc/num_tests))
--- a/examples/mxnet/_deprecated/sampling/dis_sampling/dis_graphsage_cv.py
+++ b/examples/mxnet/_deprecated/sampling/dis_sampling/dis_graphsage_cv.py
-import os, sys
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-import argparse, time, math
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-parentdir=os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
-sys.path.insert(0, parentdir)
-from graphsage_cv import GraphSAGELayer, NodeUpdate
-from graphsage_cv import GraphSAGETrain, GraphSAGEInfer
-
-def graphsage_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, distributed):
-    n0_feats = g.nodes[0].data['features']
-    num_nodes = g.number_of_nodes()
-    in_feats = n0_feats.shape[1]
-    g_ctx = n0_feats.context
-
-    norm = mx.nd.expand_dims(1./g.in_degrees().astype('float32'), 1)
-    g.set_n_repr({'norm': norm.as_in_context(g_ctx)})
-    degs = g.in_degrees().astype('float32').asnumpy()
-    degs[degs > args.num_neighbors] = args.num_neighbors
-    g.set_n_repr({'subg_norm': mx.nd.expand_dims(mx.nd.array(1./degs, ctx=g_ctx), 1)})
-    n_layers = args.n_layers
-
-    g.update_all(fn.copy_src(src='features', out='m'),
-                 fn.sum(msg='m', out='preprocess'),
-                 lambda node : {'preprocess': node.data['preprocess'] * node.data['norm']})
-    for i in range(n_layers):
-        g.init_ndata('h_{}'.format(i), (num_nodes, args.n_hidden), 'float32')
-        g.init_ndata('agg_h_{}'.format(i), (num_nodes, args.n_hidden), 'float32')
-
-    model = GraphSAGETrain(in_feats,
-                           args.n_hidden,
-                           n_classes,
-                           n_layers,
-                           args.dropout,
-                           prefix='GraphSAGE')
-
-    model.initialize(ctx=ctx)
-
-    loss_fcn = gluon.loss.SoftmaxCELoss()
-
-    infer_model = GraphSAGEInfer(in_feats,
-                                 args.n_hidden,
-                                 n_classes,
-                                 n_layers,
-                                 prefix='GraphSAGE')
-
-    infer_model.initialize(ctx=ctx)
-
-    # use optimizer
-    print(model.collect_params())
-    kv_type = 'dist_sync' if distributed else 'local'
-    trainer = gluon.Trainer(model.collect_params(), 'adam',
-                            {'learning_rate': args.lr, 'wd': args.weight_decay},
-                            kvstore=mx.kv.create(kv_type))
-
-    # Create sampler receiver
-    sampler = dgl.contrib.sampling.SamplerReceiver(graph=g, addr=args.ip, num_sender=args.num_sampler)
-
-    # initialize graph
-    dur = []
-
-    adj = g.adjacency_matrix(transpose=False).as_in_context(g_ctx)
-    for epoch in range(args.n_epochs):
-        start = time.time()
-        if distributed:
-            msg_head = "Worker {:d}, epoch {:d}".format(g.worker_id, epoch)
-        else:
-            msg_head = "epoch {:d}".format(epoch)
-        for nf in sampler:
-            for i in range(n_layers):
-                agg_history_str = 'agg_h_{}'.format(i)
-                dests = nf.layer_parent_nid(i+1).as_in_context(g_ctx)
-                # TODO we could use DGLGraph.pull to implement this, but the current
-                # implementation of pull is very slow. Let's manually do it for now.
-                agg = mx.nd.dot(mx.nd.take(adj, dests), g.nodes[:].data['h_{}'.format(i)])
-                g.set_n_repr({agg_history_str: agg}, dests)
-
-            node_embed_names = [['preprocess', 'features', 'h_0']]
-            for i in range(1, n_layers):
-                node_embed_names.append(['h_{}'.format(i), 'agg_h_{}'.format(i-1), 'subg_norm', 'norm'])
-            node_embed_names.append(['agg_h_{}'.format(n_layers-1), 'subg_norm', 'norm'])
-
-            nf.copy_from_parent(node_embed_names=node_embed_names, ctx=ctx)
-            # forward
-            with mx.autograd.record():
-                pred = model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                loss = loss_fcn(pred, batch_labels)
-                if distributed:
-                    loss = loss.sum() / (len(batch_nids) * g.num_workers)
-                else:
-                    loss = loss.sum() / (len(batch_nids))
-
-            loss.backward()
-            trainer.step(batch_size=1)
-
-            node_embed_names = [['h_{}'.format(i)] for i in range(n_layers)]
-            node_embed_names.append([])
-
-            nf.copy_to_parent(node_embed_names=node_embed_names)
-        mx.nd.waitall()
-        print(msg_head + ': training takes ' + str(time.time() - start))
-
-        infer_params = infer_model.collect_params()
-
-        for key in infer_params:
-            idx = trainer._param2idx[key]
-            trainer._kvstore.pull(idx, out=infer_params[key].data())
-
-        num_acc = 0.
-        num_tests = 0
-
-        if not distributed or g.worker_id == 0:
-            for nf in dgl.contrib.sampling.NeighborSampler(g, args.test_batch_size,
-                                                           g.number_of_nodes(),
-                                                           neighbor_type='in',
-                                                           num_hops=n_layers,
-                                                           seed_nodes=test_nid,
-                                                           add_self_loop=True):
-                node_embed_names = [['preprocess', 'features']]
-                for i in range(n_layers):
-                    node_embed_names.append(['norm', 'subg_norm'])
-                nf.copy_from_parent(node_embed_names=node_embed_names, ctx=ctx)
-
-                pred = infer_model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                num_acc += (pred.argmax(axis=1) == batch_labels).sum().asscalar()
-                num_tests += nf.layer_size(-1)
-                if distributed:
-                    g._sync_barrier()
-                print(msg_head + ": Test Accuracy {:.4f}". format(num_acc/num_tests))
-                break
-        elif distributed:
-                g._sync_barrier()
--- a/examples/mxnet/_deprecated/sampling/dis_sampling/sampler.py
+++ b/examples/mxnet/_deprecated/sampling/dis_sampling/sampler.py
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-from functools import partial
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-from dgl.contrib.sampling import SamplerPool
-import time
-
-class MySamplerPool(SamplerPool):
-    def worker(self, args):
-        """User-defined worker function
-        """
-        is_shuffle = True
-        self_loop = False;
-        number_hops = 1
-
-        if args.model == "gcn_ns":
-            number_hops = args.n_layers + 1
-        elif args.model == "gcn_cv":
-            number_hops = args.n_layers
-        elif args.model == "graphsage_cv":
-            num_hops = args.n_layers
-            self_loop = True
-        else:
-            print("unknown model. Please choose from gcn_ns, gcn_cv, graphsage_cv")
-
-        # Start sender
-        namebook = { 0:args.ip }
-        sender = dgl.contrib.sampling.SamplerSender(namebook)
-
-        # load and preprocess dataset
-        data = load_data(args)
-
-        ctx = mx.cpu()
-
-        if args.self_loop and not args.dataset.startswith('reddit'):
-            data.graph.add_edges_from([(i,i) for i in range(len(data.graph))])
-
-        train_nid = mx.nd.array(np.nonzero(data.train_mask)[0]).astype(np.int64).as_in_context(ctx)
-        test_nid = mx.nd.array(np.nonzero(data.test_mask)[0]).astype(np.int64).as_in_context(ctx)
-
-        # create GCN model
-        g = DGLGraph(data.graph, readonly=True)
-
-        while True:
-            idx = 0
-            for nf in dgl.contrib.sampling.NeighborSampler(g, args.batch_size,
-                                                           args.num_neighbors,
-                                                           neighbor_type='in',
-                                                           shuffle=is_shuffle,
-                                                           num_workers=32,
-                                                           num_hops=number_hops,
-                                                           add_self_loop=self_loop,
-                                                           seed_nodes=train_nid):
-                print("send train nodeflow: %d" %(idx))
-                sender.send(nf, 0)
-                idx += 1
-            sender.signal(0)
-        
-def main(args):
-    pool = MySamplerPool()
-    pool.start(args.num_sampler, args)
- 
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='GCN')
-    register_data_args(parser)
-    parser.add_argument("--model", type=str,
-                        help="select a model. Valid models: gcn_ns, gcn_cv, graphsage_cv")
-    parser.add_argument("--batch-size", type=int, default=1000,
-            help="batch size")
-    parser.add_argument("--num-neighbors", type=int, default=3,
-            help="number of neighbors to be sampled")
-    parser.add_argument("--self-loop", action='store_true',
-            help="graph self-loop (default=False)")
-    parser.add_argument("--n-layers", type=int, default=1,
-            help="number of hidden gcn layers")
-    parser.add_argument("--ip", type=str, default='127.0.0.1:50051',
-            help="IP address")
-    parser.add_argument("--num-sampler", type=int, default=1,
-            help="number of sampler")
-    args = parser.parse_args()
-
-    print(args)
-
-    main(args)
\ No newline at end of file
--- a/examples/mxnet/_deprecated/sampling/dis_sampling/train.py
+++ b/examples/mxnet/_deprecated/sampling/dis_sampling/train.py
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-from functools import partial
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-from dis_gcn_ns_sc import gcn_ns_train
-from dis_gcn_cv_sc import gcn_cv_train
-from dis_graphsage_cv import graphsage_cv_train
-
-
-def main(args):
-    # load and preprocess dataset
-    data = load_data(args)
-
-    if args.gpu >= 0:
-        ctx = mx.gpu(args.gpu)
-    else:
-        ctx = mx.cpu()
-
-    if args.self_loop and not args.dataset.startswith('reddit'):
-        data.graph.add_edges_from([(i,i) for i in range(len(data.graph))])
-
-    train_nid = mx.nd.array(np.nonzero(data.train_mask)[0]).astype(np.int64)
-    test_nid = mx.nd.array(np.nonzero(data.test_mask)[0]).astype(np.int64)
-
-    features = mx.nd.array(data.features)
-    labels = mx.nd.array(data.labels)
-    train_mask = mx.nd.array(data.train_mask)
-    val_mask = mx.nd.array(data.val_mask)
-    test_mask = mx.nd.array(data.test_mask)
-    in_feats = features.shape[1]
-    n_classes = data.num_labels
-    n_edges = data.graph.number_of_edges()
-
-    n_train_samples = train_mask.sum().asscalar()
-    n_val_samples = val_mask.sum().asscalar()
-    n_test_samples = test_mask.sum().asscalar()
-
-    print("""----Data statistics------'
-      #Edges %d
-      #Classes %d
-      #Train samples %d
-      #Val samples %d
-      #Test samples %d""" %
-          (n_edges, n_classes,
-              n_train_samples,
-              n_val_samples,
-              n_test_samples))
-
-    # create GCN model
-    g = DGLGraph(data.graph, readonly=True)
-    g.ndata['features'] = features
-    g.ndata['labels'] = labels
-
-    if args.model == "gcn_ns":
-        gcn_ns_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples)
-    elif args.model == "gcn_cv":
-        gcn_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, False)
-    elif args.model == "graphsage_cv":
-        graphsage_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, False)
-    else:
-        print("unknown model. Please choose from gcn_ns, gcn_cv, graphsage_cv")
-
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='GCN')
-    register_data_args(parser)
-    parser.add_argument("--model", type=str,
-                        help="select a model. Valid models: gcn_ns, gcn_cv, graphsage_cv")
-    parser.add_argument("--dropout", type=float, default=0.5,
-            help="dropout probability")
-    parser.add_argument("--gpu", type=int, default=-1,
-            help="gpu")
-    parser.add_argument("--lr", type=float, default=3e-2,
-            help="learning rate")
-    parser.add_argument("--n-epochs", type=int, default=200,
-            help="number of training epochs")
-    parser.add_argument("--batch-size", type=int, default=1000,
-            help="batch size")
-    parser.add_argument("--test-batch-size", type=int, default=1000,
-            help="test batch size")
-    parser.add_argument("--num-neighbors", type=int, default=3,
-            help="number of neighbors to be sampled")
-    parser.add_argument("--n-hidden", type=int, default=16,
-            help="number of hidden gcn units")
-    parser.add_argument("--n-layers", type=int, default=1,
-            help="number of hidden gcn layers")
-    parser.add_argument("--self-loop", action='store_true',
-            help="graph self-loop (default=False)")
-    parser.add_argument("--weight-decay", type=float, default=5e-4,
-            help="Weight for L2 loss")
-    parser.add_argument("--nworkers", type=int, default=1,
-            help="number of workers")
-    parser.add_argument("--ip", type=str, default='127.0.0.1:50051',
-            help="IP address")
-    parser.add_argument("--num-sampler", type=int, default=1,
-            help="number of sampler")
-    args = parser.parse_args()
-
-    print(args)
-
-    main(args)
--- a/examples/mxnet/_deprecated/sampling/gcn_cv_sc.py
+++ b/examples/mxnet/_deprecated/sampling/gcn_cv_sc.py
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-
-
-class NodeUpdate(gluon.Block):
-    def __init__(self, layer_id, in_feats, out_feats, dropout, activation=None, test=False, concat=False):
-        super(NodeUpdate, self).__init__()
-        self.layer_id = layer_id
-        self.dropout = dropout
-        self.test = test
-        self.concat = concat
-        with self.name_scope():
-            self.dense = gluon.nn.Dense(out_feats, in_units=in_feats)
-            self.activation = activation
-
-    def forward(self, node):
-        h = node.data['h']
-        norm = node.data['norm']
-        if self.test:
-            h = h * norm
-        else:
-            agg_history_str = 'agg_h_{}'.format(self.layer_id-1)
-            agg_history = node.data[agg_history_str]
-            subg_norm = node.data['subg_norm']
-            # control variate
-            h = h * subg_norm + agg_history * norm
-            if self.dropout:
-                h = mx.nd.Dropout(h, p=self.dropout)
-        h = self.dense(h)
-        if self.concat:
-            h = mx.nd.concat(h, self.activation(h))
-        elif self.activation:
-            h = self.activation(h)
-        return {'activation': h}
-
-
-class GCNSampling(gluon.Block):
-    def __init__(self,
-                 in_feats,
-                 n_hidden,
-                 n_classes,
-                 n_layers,
-                 activation,
-                 dropout,
-                 **kwargs):
-        super(GCNSampling, self).__init__(**kwargs)
-        self.dropout = dropout
-        self.n_layers = n_layers
-        with self.name_scope():
-            self.layers = gluon.nn.Sequential()
-            # input layer
-            self.dense = gluon.nn.Dense(n_hidden, in_units=in_feats)
-            self.activation = activation
-            # hidden layers
-            for i in range(1, n_layers):
-                skip_start = (i == self.n_layers-1)
-                self.layers.add(NodeUpdate(i, n_hidden, n_hidden, dropout, activation, concat=skip_start))
-            # output layer
-            self.layers.add(NodeUpdate(n_layers, 2*n_hidden, n_classes, dropout))
-
-    def forward(self, nf):
-        h = nf.layers[0].data['preprocess']
-        if self.dropout:
-            h = mx.nd.Dropout(h, p=self.dropout)
-        h = self.dense(h)
-
-        skip_start = (0 == self.n_layers-1)
-        if skip_start:
-            h = mx.nd.concat(h, self.activation(h))
-        else:
-            h = self.activation(h)
-
-        for i, layer in enumerate(self.layers):
-            new_history = h.copy().detach()
-            history_str = 'h_{}'.format(i)
-            history = nf.layers[i].data[history_str]
-            h = h - history
-
-            nf.layers[i].data['h'] = h
-            nf.block_compute(i,
-                             fn.copy_src(src='h', out='m'),
-                             fn.sum(msg='m', out='h'),
-                             layer)
-            h = nf.layers[i+1].data.pop('activation')
-            # update history
-            if i < nf.num_layers-1:
-                nf.layers[i].data[history_str] = new_history
-
-        return h
-
-
-class GCNInfer(gluon.Block):
-    def __init__(self,
-                 in_feats,
-                 n_hidden,
-                 n_classes,
-                 n_layers,
-                 activation,
-                 **kwargs):
-        super(GCNInfer, self).__init__(**kwargs)
-        self.n_layers = n_layers
-        with self.name_scope():
-            self.layers = gluon.nn.Sequential()
-            # input layer
-            self.dense = gluon.nn.Dense(n_hidden, in_units=in_feats)
-            self.activation = activation
-            # hidden layers
-            for i in range(1, n_layers):
-                skip_start = (i == self.n_layers-1)
-                self.layers.add(NodeUpdate(i, n_hidden, n_hidden, 0, activation, True, concat=skip_start))
-            # output layer
-            self.layers.add(NodeUpdate(n_layers, 2*n_hidden, n_classes, 0, None, True))
-
-
-    def forward(self, nf):
-        h = nf.layers[0].data['preprocess']
-        h = self.dense(h)
-
-        skip_start = (0 == self.n_layers-1)
-        if skip_start:
-            h = mx.nd.concat(h, self.activation(h))
-        else:
-            h = self.activation(h)
-
-        for i, layer in enumerate(self.layers):
-            nf.layers[i].data['h'] = h
-            nf.block_compute(i,
-                             fn.copy_src(src='h', out='m'),
-                             fn.sum(msg='m', out='h'),
-                             layer)
-            h = nf.layers[i+1].data.pop('activation')
-
-        return h
-
-
-def gcn_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, distributed):
-    n0_feats = g.nodes[0].data['features']
-    num_nodes = g.number_of_nodes()
-    in_feats = n0_feats.shape[1]
-    g_ctx = n0_feats.context
-
-    norm = mx.nd.expand_dims(1./g.in_degrees().astype('float32'), 1)
-    g.set_n_repr({'norm': norm.as_in_context(g_ctx)})
-    degs = g.in_degrees().astype('float32').asnumpy()
-    degs[degs > args.num_neighbors] = args.num_neighbors
-    g.set_n_repr({'subg_norm': mx.nd.expand_dims(mx.nd.array(1./degs, ctx=g_ctx), 1)})
-    n_layers = args.n_layers
-
-    g.update_all(fn.copy_src(src='features', out='m'),
-                 fn.sum(msg='m', out='preprocess'),
-                 lambda node : {'preprocess': node.data['preprocess'] * node.data['norm']})
-    for i in range(n_layers - 1):
-        g.init_ndata('h_{}'.format(i), (num_nodes, args.n_hidden), 'float32')
-        g.init_ndata('agg_h_{}'.format(i), (num_nodes, args.n_hidden), 'float32')
-    g.init_ndata('h_{}'.format(n_layers-1), (num_nodes, 2*args.n_hidden), 'float32')
-    g.init_ndata('agg_h_{}'.format(n_layers-1), (num_nodes, 2*args.n_hidden), 'float32')
-
-    model = GCNSampling(in_feats,
-                        args.n_hidden,
-                        n_classes,
-                        n_layers,
-                        mx.nd.relu,
-                        args.dropout,
-                        prefix='GCN')
-
-    model.initialize(ctx=ctx)
-
-    loss_fcn = gluon.loss.SoftmaxCELoss()
-
-    infer_model = GCNInfer(in_feats,
-                           args.n_hidden,
-                           n_classes,
-                           n_layers,
-                           mx.nd.relu,
-                           prefix='GCN')
-
-    infer_model.initialize(ctx=ctx)
-
-    # use optimizer
-    print(model.collect_params())
-    kv_type = 'dist_sync' if distributed else 'local'
-    trainer = gluon.Trainer(model.collect_params(), 'adam',
-                            {'learning_rate': args.lr, 'wd': args.weight_decay},
-                            kvstore=mx.kv.create(kv_type))
-
-    # initialize graph
-    dur = []
-    adj = g.adjacency_matrix(transpose=False).as_in_context(g_ctx)
-    for epoch in range(args.n_epochs):
-        start = time.time()
-        if distributed:
-            msg_head = "Worker {:d}, epoch {:d}".format(g.worker_id, epoch)
-        else:
-            msg_head = "epoch {:d}".format(epoch)
-        for nf in dgl.contrib.sampling.NeighborSampler(g, args.batch_size,
-                                                       args.num_neighbors,
-                                                       neighbor_type='in',
-                                                       num_workers=32,
-                                                       shuffle=True,
-                                                       num_hops=n_layers,
-                                                       seed_nodes=train_nid):
-            for i in range(n_layers):
-                agg_history_str = 'agg_h_{}'.format(i)
-                dests = nf.layer_parent_nid(i+1).as_in_context(g_ctx)
-                # TODO we could use DGLGraph.pull to implement this, but the current
-                # implementation of pull is very slow. Let's manually do it for now.
-                agg = mx.nd.dot(mx.nd.take(adj, dests), g.nodes[:].data['h_{}'.format(i)])
-                g.set_n_repr({agg_history_str: agg}, dests)
-
-            node_embed_names = [['preprocess', 'h_0']]
-            for i in range(1, n_layers):
-                node_embed_names.append(['h_{}'.format(i), 'agg_h_{}'.format(i-1), 'subg_norm', 'norm'])
-            node_embed_names.append(['agg_h_{}'.format(n_layers-1), 'subg_norm', 'norm'])
-
-            nf.copy_from_parent(node_embed_names=node_embed_names, ctx=ctx)
-            # forward
-            with mx.autograd.record():
-                pred = model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                loss = loss_fcn(pred, batch_labels)
-                loss = loss.sum() / len(batch_nids)
-
-            loss.backward()
-            trainer.step(batch_size=1)
-
-            node_embed_names = [['h_{}'.format(i)] for i in range(n_layers)]
-            node_embed_names.append([])
-
-            nf.copy_to_parent(node_embed_names=node_embed_names)
-        mx.nd.waitall()
-        print(msg_head + ': training takes ' + str(time.time() - start))
-
-        infer_params = infer_model.collect_params()
-
-        for key in infer_params:
-            idx = trainer._param2idx[key]
-            trainer._kvstore.pull(idx, out=infer_params[key].data())
-
-        num_acc = 0.
-        num_tests = 0
-
-        if not distributed or g.worker_id == 0:
-            for nf in dgl.contrib.sampling.NeighborSampler(g, args.test_batch_size,
-                                                           g.number_of_nodes(),
-                                                           neighbor_type='in',
-                                                           num_hops=n_layers,
-                                                           seed_nodes=test_nid):
-                node_embed_names = [['preprocess']]
-                for i in range(n_layers):
-                    node_embed_names.append(['norm'])
-
-                nf.copy_from_parent(node_embed_names=node_embed_names, ctx=ctx)
-                pred = infer_model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                num_acc += (pred.argmax(axis=1) == batch_labels).sum().asscalar()
-                num_tests += nf.layer_size(-1)
-                if distributed:
-                    g._sync_barrier()
-                print("Test Accuracy {:.4f}". format(num_acc/num_tests))
-                break
-        elif distributed:
-                g._sync_barrier()
--- a/examples/mxnet/_deprecated/sampling/gcn_ns_sc.py
+++ b/examples/mxnet/_deprecated/sampling/gcn_ns_sc.py
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-from functools import partial
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-
-
-class NodeUpdate(gluon.Block):
-    def __init__(self, in_feats, out_feats, activation=None, concat=False):
-        super(NodeUpdate, self).__init__()
-        self.dense = gluon.nn.Dense(out_feats, in_units=in_feats)
-        self.activation = activation
-        self.concat = concat
-
-    def forward(self, node):
-        h = node.data['h']
-        h = h * node.data['norm']
-        h = self.dense(h)
-        # skip connection
-        if self.concat:
-            h = mx.nd.concat(h, self.activation(h))
-        elif self.activation:
-            h = self.activation(h)
-        return {'activation': h}
-
-
-class GCNSampling(gluon.Block):
-    def __init__(self,
-                 in_feats,
-                 n_hidden,
-                 n_classes,
-                 n_layers,
-                 activation,
-                 dropout,
-                 **kwargs):
-        super(GCNSampling, self).__init__(**kwargs)
-        self.dropout = dropout
-        self.n_layers = n_layers
-        with self.name_scope():
-            self.layers = gluon.nn.Sequential()
-            # input layer
-            skip_start = (0 == n_layers-1)
-            self.layers.add(NodeUpdate(in_feats, n_hidden, activation, concat=skip_start))
-            # hidden layers
-            for i in range(1, n_layers):
-                skip_start = (i == n_layers-1)
-                self.layers.add(NodeUpdate(n_hidden, n_hidden, activation, concat=skip_start))
-            # output layer
-            self.layers.add(NodeUpdate(2*n_hidden, n_classes))
-
-
-    def forward(self, nf):
-        nf.layers[0].data['activation'] = nf.layers[0].data['features']
-
-        for i, layer in enumerate(self.layers):
-            h = nf.layers[i].data.pop('activation')
-            if self.dropout:
-                h = mx.nd.Dropout(h, p=self.dropout)
-            nf.layers[i].data['h'] = h
-            degs = nf.layer_in_degree(i + 1).astype('float32').as_in_context(h.context)
-            nf.layers[i + 1].data['norm'] = mx.nd.expand_dims(1./degs, 1)
-            nf.block_compute(i,
-                             fn.copy_src(src='h', out='m'),
-                             fn.sum(msg='m', out='h'),
-                             layer)
-
-        h = nf.layers[-1].data.pop('activation')
-        return h
-
-
-class GCNInfer(gluon.Block):
-    def __init__(self,
-                 in_feats,
-                 n_hidden,
-                 n_classes,
-                 n_layers,
-                 activation,
-                 **kwargs):
-        super(GCNInfer, self).__init__(**kwargs)
-        self.n_layers = n_layers
-        with self.name_scope():
-            self.layers = gluon.nn.Sequential()
-            # input layer
-            skip_start = (0 == n_layers-1)
-            self.layers.add(NodeUpdate(in_feats, n_hidden, activation, concat=skip_start))
-            # hidden layers
-            for i in range(1, n_layers):
-                skip_start = (i == n_layers-1)
-                self.layers.add(NodeUpdate(n_hidden, n_hidden, activation, concat=skip_start))
-            # output layer
-            self.layers.add(NodeUpdate(2*n_hidden, n_classes))
-
-
-    def forward(self, nf):
-        nf.layers[0].data['activation'] = nf.layers[0].data['features']
-
-        for i, layer in enumerate(self.layers):
-            h = nf.layers[i].data.pop('activation')
-            nf.layers[i].data['h'] = h
-            nf.block_compute(i,
-                             fn.copy_src(src='h', out='m'),
-                             fn.sum(msg='m', out='h'),
-                             layer)
-
-        return nf.layers[-1].data.pop('activation')
-
-
-def gcn_ns_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples):
-    n0_feats = g.nodes[0].data['features']
-    in_feats = n0_feats.shape[1]
-    g_ctx = n0_feats.context
-
-    degs = g.in_degrees().astype('float32').as_in_context(g_ctx)
-    norm = mx.nd.expand_dims(1./degs, 1)
-    g.set_n_repr({'norm': norm})
-
-    model = GCNSampling(in_feats,
-                        args.n_hidden,
-                        n_classes,
-                        args.n_layers,
-                        mx.nd.relu,
-                        args.dropout,
-                        prefix='GCN')
-
-    model.initialize(ctx=ctx)
-    loss_fcn = gluon.loss.SoftmaxCELoss()
-
-    infer_model = GCNInfer(in_feats,
-                           args.n_hidden,
-                           n_classes,
-                           args.n_layers,
-                           mx.nd.relu,
-                           prefix='GCN')
-
-    infer_model.initialize(ctx=ctx)
-
-    # use optimizer
-    print(model.collect_params())
-    trainer = gluon.Trainer(model.collect_params(), 'adam',
-                            {'learning_rate': args.lr, 'wd': args.weight_decay},
-                            kvstore=mx.kv.create('local'))
-
-    # initialize graph
-    dur = []
-    for epoch in range(args.n_epochs):
-        for nf in dgl.contrib.sampling.NeighborSampler(g, args.batch_size,
-                                                       args.num_neighbors,
-                                                       neighbor_type='in',
-                                                       shuffle=True,
-                                                       num_workers=32,
-                                                       num_hops=args.n_layers+1,
-                                                       seed_nodes=train_nid):
-            nf.copy_from_parent(ctx=ctx)
-            # forward
-            with mx.autograd.record():
-                pred = model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                loss = loss_fcn(pred, batch_labels)
-                loss = loss.sum() / len(batch_nids)
-
-            loss.backward()
-            trainer.step(batch_size=1)
-
-        infer_params = infer_model.collect_params()
-
-        for key in infer_params:
-            idx = trainer._param2idx[key]
-            trainer._kvstore.pull(idx, out=infer_params[key].data())
-
-        num_acc = 0.
-        num_tests = 0
-
-        for nf in dgl.contrib.sampling.NeighborSampler(g, args.test_batch_size,
-                                                       g.number_of_nodes(),
-                                                       neighbor_type='in',
-                                                       num_hops=args.n_layers+1,
-                                                       seed_nodes=test_nid):
-            nf.copy_from_parent(ctx=ctx)
-            pred = infer_model(nf)
-            batch_nids = nf.layer_parent_nid(-1)
-            batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-            num_acc += (pred.argmax(axis=1) == batch_labels).sum().asscalar()
-            num_tests += nf.layer_size(-1)
-            break
-
-        print("Test Accuracy {:.4f}". format(num_acc/num_tests))
--- a/examples/mxnet/_deprecated/sampling/graphsage_cv.py
+++ b/examples/mxnet/_deprecated/sampling/graphsage_cv.py
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-import argparse, time, math
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-
-class GraphSAGELayer(gluon.Block):
-    def __init__(self,
-                 in_feats,
-                 hidden,
-                 out_feats,
-                 dropout,
-                 last=False,
-                 **kwargs):
-        super(GraphSAGELayer, self).__init__(**kwargs)
-        self.last = last
-        self.dropout = dropout
-        with self.name_scope():
-            self.dense1 = gluon.nn.Dense(hidden, in_units=in_feats)
-            self.layer_norm1 = gluon.nn.LayerNorm(in_channels=hidden)
-            self.dense2 = gluon.nn.Dense(out_feats, in_units=hidden)
-            if not self.last:
-                self.layer_norm2 = gluon.nn.LayerNorm(in_channels=out_feats)
-
-    def forward(self, h):
-        h = self.dense1(h)
-        h = self.layer_norm1(h)
-        h = mx.nd.relu(h)
-        if self.dropout:
-            h = mx.nd.Dropout(h, p=self.dropout)
-        h = self.dense2(h)
-        if not self.last:
-            h = self.layer_norm2(h)
-            h = mx.nd.relu(h)
-        return h
-
-
-class NodeUpdate(gluon.Block):
-    def __init__(self, layer_id, in_feats, out_feats, hidden, dropout,
-                 test=False, last=False):
-        super(NodeUpdate, self).__init__()
-        self.layer_id = layer_id
-        self.dropout = dropout
-        self.test = test
-        self.last = last
-        with self.name_scope():
-            self.layer = GraphSAGELayer(in_feats, hidden, out_feats, dropout, last)
-
-    def forward(self, node):
-        h = node.data['h']
-        norm = node.data['norm']
-        # activation from previous layer of myself
-        self_h = node.data['self_h']
-
-        if self.test:
-            h = (h - self_h) * norm
-            # graphsage
-            h = mx.nd.concat(h, self_h)
-        else:
-            agg_history_str = 'agg_h_{}'.format(self.layer_id-1)
-            agg_history = node.data[agg_history_str]
-            # normalization constant
-            subg_norm = node.data['subg_norm']
-            # delta_h (h - history) from previous layer of myself
-            self_delta_h = node.data['self_delta_h']
-            # control variate
-            h = (h - self_delta_h) * subg_norm + agg_history * norm
-            # graphsage
-            h = mx.nd.concat(h, self_h)
-            if self.dropout:
-                h = mx.nd.Dropout(h, p=self.dropout)
-
-        h = self.layer(h)
-
-        return {'activation': h}
-
-
-
-class GraphSAGETrain(gluon.Block):
-    def __init__(self,
-                 in_feats,
-                 n_hidden,
-                 n_classes,
-                 n_layers,
-                 dropout,
-                 **kwargs):
-        super(GraphSAGETrain, self).__init__(**kwargs)
-        self.dropout = dropout
-        with self.name_scope():
-            self.layers = gluon.nn.Sequential()
-            # input layer
-            self.input_layer = GraphSAGELayer(2*in_feats, n_hidden, n_hidden, dropout)
-            # hidden layers
-            for i in range(1, n_layers):
-                self.layers.add(NodeUpdate(i, 2*n_hidden, n_hidden, n_hidden, dropout))
-            # output layer
-            self.layers.add(NodeUpdate(n_layers, 2*n_hidden, n_classes, n_hidden, dropout, last=True))
-
-    def forward(self, nf):
-        h = nf.layers[0].data['preprocess']
-        features = nf.layers[0].data['features']
-        h = mx.nd.concat(h, features)
-        if self.dropout:
-            h = mx.nd.Dropout(h, p=self.dropout)
-
-        h = self.input_layer(h)
-
-        for i, layer in enumerate(self.layers):
-            parent_nid = dgl.utils.toindex(nf.layer_parent_nid(i+1))
-            layer_nid = nf.map_from_parent_nid(i, parent_nid,
-                                               remap_local=True).as_in_context(h.context)
-            self_h = h[layer_nid]
-            # activation from previous layer of myself, used in graphSAGE
-            nf.layers[i+1].data['self_h'] = self_h
-
-            new_history = h.copy().detach()
-            history_str = 'h_{}'.format(i)
-            history = nf.layers[i].data[history_str]
-            # delta_h used in control variate
-            delta_h = h - history
-            # delta_h from previous layer of the nodes in (i+1)-th layer, used in control variate
-            nf.layers[i+1].data['self_delta_h'] = delta_h[layer_nid]
-
-            nf.layers[i].data['h'] = delta_h
-            nf.block_compute(i,
-                             fn.copy_src(src='h', out='m'),
-                             fn.sum(msg='m', out='h'),
-                             layer)
-            h = nf.layers[i+1].data.pop('activation')
-            # update history
-            if i < nf.num_layers-1:
-                nf.layers[i].data[history_str] = new_history
-
-        return h
-
-
-class GraphSAGEInfer(gluon.Block):
-    def __init__(self,
-                 in_feats,
-                 n_hidden,
-                 n_classes,
-                 n_layers,
-                 **kwargs):
-        super(GraphSAGEInfer, self).__init__(**kwargs)
-        with self.name_scope():
-            self.layers = gluon.nn.Sequential()
-            # input layer
-            self.input_layer = GraphSAGELayer(2*in_feats, n_hidden, n_hidden, 0)
-            # hidden layers
-            for i in range(1, n_layers):
-                self.layers.add(NodeUpdate(i, 2*n_hidden, n_hidden, n_hidden, 0, True))
-            # output layer
-            self.layers.add(NodeUpdate(n_layers, 2*n_hidden, n_classes, n_hidden, 0, True, last=True))
-
-
-    def forward(self, nf):
-        h = nf.layers[0].data['preprocess']
-        features = nf.layers[0].data['features']
-        h = mx.nd.concat(h, features)
-        h = self.input_layer(h)
-
-        for i, layer in enumerate(self.layers):
-            nf.layers[i].data['h'] = h
-            parent_nid = dgl.utils.toindex(nf.layer_parent_nid(i+1))
-            layer_nid = nf.map_from_parent_nid(i, parent_nid,
-                                               remap_local=True).as_in_context(h.context)
-            # activation from previous layer of the nodes in (i+1)-th layer, used in graphSAGE
-            self_h = h[layer_nid]
-            nf.layers[i+1].data['self_h'] = self_h
-            nf.block_compute(i,
-                             fn.copy_src(src='h', out='m'),
-                             fn.sum(msg='m', out='h'),
-                             layer)
-            h = nf.layers[i+1].data.pop('activation')
-
-        return h
-
-
-def graphsage_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, distributed):
-    n0_feats = g.nodes[0].data['features']
-    num_nodes = g.number_of_nodes()
-    in_feats = n0_feats.shape[1]
-    g_ctx = n0_feats.context
-
-    norm = mx.nd.expand_dims(1./g.in_degrees().astype('float32'), 1)
-    g.set_n_repr({'norm': norm.as_in_context(g_ctx)})
-    degs = g.in_degrees().astype('float32').asnumpy()
-    degs[degs > args.num_neighbors] = args.num_neighbors
-    g.set_n_repr({'subg_norm': mx.nd.expand_dims(mx.nd.array(1./degs, ctx=g_ctx), 1)})
-    n_layers = args.n_layers
-
-    g.update_all(fn.copy_src(src='features', out='m'),
-                 fn.sum(msg='m', out='preprocess'),
-                 lambda node : {'preprocess': node.data['preprocess'] * node.data['norm']})
-    for i in range(n_layers):
-        g.init_ndata('h_{}'.format(i), (num_nodes, args.n_hidden), 'float32')
-        g.init_ndata('agg_h_{}'.format(i), (num_nodes, args.n_hidden), 'float32')
-
-    model = GraphSAGETrain(in_feats,
-                           args.n_hidden,
-                           n_classes,
-                           n_layers,
-                           args.dropout,
-                           prefix='GraphSAGE')
-
-    model.initialize(ctx=ctx)
-
-    loss_fcn = gluon.loss.SoftmaxCELoss()
-
-    infer_model = GraphSAGEInfer(in_feats,
-                                 args.n_hidden,
-                                 n_classes,
-                                 n_layers,
-                                 prefix='GraphSAGE')
-
-    infer_model.initialize(ctx=ctx)
-
-    # use optimizer
-    print(model.collect_params())
-    kv_type = 'dist_sync' if distributed else 'local'
-    trainer = gluon.Trainer(model.collect_params(), 'adam',
-                            {'learning_rate': args.lr, 'wd': args.weight_decay},
-                            kvstore=mx.kv.create(kv_type))
-
-    # initialize graph
-    dur = []
-
-    adj = g.adjacency_matrix(transpose=False).as_in_context(g_ctx)
-    for epoch in range(args.n_epochs):
-        start = time.time()
-        if distributed:
-            msg_head = "Worker {:d}, epoch {:d}".format(g.worker_id, epoch)
-        else:
-            msg_head = "epoch {:d}".format(epoch)
-        for nf in dgl.contrib.sampling.NeighborSampler(g, args.batch_size,
-                                                       args.num_neighbors,
-                                                       neighbor_type='in',
-                                                       shuffle=True,
-                                                       num_workers=32,
-                                                       num_hops=n_layers,
-                                                       add_self_loop=True,
-                                                       seed_nodes=train_nid):
-            for i in range(n_layers):
-                agg_history_str = 'agg_h_{}'.format(i)
-                dests = nf.layer_parent_nid(i+1).as_in_context(g_ctx)
-                # TODO we could use DGLGraph.pull to implement this, but the current
-                # implementation of pull is very slow. Let's manually do it for now.
-                agg = mx.nd.dot(mx.nd.take(adj, dests), g.nodes[:].data['h_{}'.format(i)])
-                g.set_n_repr({agg_history_str: agg}, dests)
-
-            node_embed_names = [['preprocess', 'features', 'h_0']]
-            for i in range(1, n_layers):
-                node_embed_names.append(['h_{}'.format(i), 'agg_h_{}'.format(i-1), 'subg_norm', 'norm'])
-            node_embed_names.append(['agg_h_{}'.format(n_layers-1), 'subg_norm', 'norm'])
-
-            nf.copy_from_parent(node_embed_names=node_embed_names, ctx=ctx)
-            # forward
-            with mx.autograd.record():
-                pred = model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                loss = loss_fcn(pred, batch_labels)
-                if distributed:
-                    loss = loss.sum() / (len(batch_nids) * g.num_workers)
-                else:
-                    loss = loss.sum() / (len(batch_nids))
-
-            loss.backward()
-            trainer.step(batch_size=1)
-
-            node_embed_names = [['h_{}'.format(i)] for i in range(n_layers)]
-            node_embed_names.append([])
-
-            nf.copy_to_parent(node_embed_names=node_embed_names)
-        mx.nd.waitall()
-        print(msg_head + ': training takes ' + str(time.time() - start))
-
-        infer_params = infer_model.collect_params()
-
-        for key in infer_params:
-            idx = trainer._param2idx[key]
-            trainer._kvstore.pull(idx, out=infer_params[key].data())
-
-        num_acc = 0.
-        num_tests = 0
-
-        if not distributed or g.worker_id == 0:
-            for nf in dgl.contrib.sampling.NeighborSampler(g, args.test_batch_size,
-                                                           g.number_of_nodes(),
-                                                           neighbor_type='in',
-                                                           num_hops=n_layers,
-                                                           seed_nodes=test_nid,
-                                                           add_self_loop=True):
-                node_embed_names = [['preprocess', 'features']]
-                for i in range(n_layers):
-                    node_embed_names.append(['norm', 'subg_norm'])
-                nf.copy_from_parent(node_embed_names=node_embed_names, ctx=ctx)
-
-                pred = infer_model(nf)
-                batch_nids = nf.layer_parent_nid(-1)
-                batch_labels = g.nodes[batch_nids].data['labels'].as_in_context(ctx)
-                num_acc += (pred.argmax(axis=1) == batch_labels).sum().asscalar()
-                num_tests += nf.layer_size(-1)
-                if distributed:
-                    g._sync_barrier()
-                print(msg_head + ": Test Accuracy {:.4f}". format(num_acc/num_tests))
-                break
-        elif distributed:
-                g._sync_barrier()
--- a/examples/mxnet/_deprecated/sampling/multi_process_train.py
+++ b/examples/mxnet/_deprecated/sampling/multi_process_train.py
-from multiprocessing import Process
-import argparse, time, math
-import numpy as np
-import os
-os.environ['OMP_NUM_THREADS'] = '16'
-import mxnet as mx
-from mxnet import gluon
-import dgl
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-from gcn_ns_sc import gcn_ns_train
-from gcn_cv_sc import gcn_cv_train
-from graphsage_cv import graphsage_cv_train
-
-def main(args):
-    g = dgl.contrib.graph_store.create_graph_from_store(args.graph_name, "shared_mem")
-    # We need to set random seed here. Otherwise, all processes have the same mini-batches.
-    mx.random.seed(g.worker_id)
-    features = g.nodes[:].data['features']
-    labels = g.nodes[:].data['labels']
-    train_mask = g.nodes[:].data['train_mask']
-    val_mask = g.nodes[:].data['val_mask']
-    test_mask = g.nodes[:].data['test_mask']
-
-    if args.num_gpus > 0:
-        ctx = mx.gpu(g.worker_id % args.num_gpus)
-    else:
-        ctx = mx.cpu()
-
-    train_nid = mx.nd.array(np.nonzero(train_mask.asnumpy())[0]).astype(np.int64)
-    test_nid = mx.nd.array(np.nonzero(test_mask.asnumpy())[0]).astype(np.int64)
-
-    n_classes = len(np.unique(labels.asnumpy()))
-    n_train_samples = train_mask.sum().asscalar()
-    n_val_samples = val_mask.sum().asscalar()
-    n_test_samples = test_mask.sum().asscalar()
-
-    if args.model == "gcn_ns":
-        gcn_ns_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples)
-    elif args.model == "gcn_cv":
-        gcn_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, True)
-    elif args.model == "graphsage_cv":
-        graphsage_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, True)
-    else:
-        print("unknown model. Please choose from gcn_ns, gcn_cv, graphsage_cv")
-    print("parent ends")
-
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='GCN')
-    register_data_args(parser)
-    parser.add_argument("--model", type=str,
-                        help="select a model. Valid models: gcn_ns, gcn_cv, graphsage_cv")
-    parser.add_argument("--graph-name", type=str, default="",
-            help="graph name")
-    parser.add_argument("--num-feats", type=int, default=100,
-            help="the number of features")
-    parser.add_argument("--dropout", type=float, default=0.5,
-            help="dropout probability")
-    parser.add_argument("--num-gpus", type=int, default=0,
-            help="the number of GPUs to train")
-    parser.add_argument("--lr", type=float, default=3e-2,
-            help="learning rate")
-    parser.add_argument("--n-epochs", type=int, default=200,
-            help="number of training epochs")
-    parser.add_argument("--batch-size", type=int, default=1000,
-            help="batch size")
-    parser.add_argument("--test-batch-size", type=int, default=1000,
-            help="test batch size")
-    parser.add_argument("--num-neighbors", type=int, default=3,
-            help="number of neighbors to be sampled")
-    parser.add_argument("--n-hidden", type=int, default=16,
-            help="number of hidden gcn units")
-    parser.add_argument("--n-layers", type=int, default=1,
-            help="number of hidden gcn layers")
-    parser.add_argument("--self-loop", action='store_true',
-            help="graph self-loop (default=False)")
-    parser.add_argument("--weight-decay", type=float, default=5e-4,
-            help="Weight for L2 loss")
-    args = parser.parse_args()
-
-    print(args)
-
-    main(args)
--- a/examples/mxnet/_deprecated/sampling/run_store_server.py
+++ b/examples/mxnet/_deprecated/sampling/run_store_server.py
-import os
-import argparse, time, math
-import numpy as np
-from scipy import sparse as spsp
-import mxnet as mx
-import dgl
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-
-class GraphData:
-    def __init__(self, csr, num_feats, graph_name):
-        num_nodes = csr.shape[0]
-        self.graph = dgl.graph_index.from_csr(csr.indptr, csr.indices, False, 'in')
-        self.graph = self.graph.copyto_shared_mem(dgl.contrib.graph_store._get_graph_path(graph_name))
-        self.features = mx.nd.random.normal(shape=(csr.shape[0], num_feats))
-        self.num_labels = 10
-        self.labels = mx.nd.floor(mx.nd.random.uniform(low=0, high=self.num_labels,
-                                                       shape=(csr.shape[0])))
-        self.train_mask = np.zeros((num_nodes,))
-        self.train_mask[np.arange(0, int(num_nodes/2), dtype=np.int64)] = 1
-        self.val_mask = np.zeros((num_nodes,))
-        self.val_mask[np.arange(int(num_nodes/2), int(num_nodes/4*3), dtype=np.int64)] = 1
-        self.test_mask = np.zeros((num_nodes,))
-        self.test_mask[np.arange(int(num_nodes/4*3), int(num_nodes), dtype=np.int64)] = 1
-
-def main(args):
-    # load and preprocess dataset
-    if args.graph_file != '':
-        csr = mx.nd.load(args.graph_file)[0]
-        n_edges = csr.shape[0]
-        graph_name = os.path.basename(args.graph_file)
-        data = GraphData(csr, args.num_feats, graph_name)
-        csr = None
-    else:
-        data = load_data(args)
-        n_edges = data.graph.number_of_edges()
-        graph_name = args.dataset
-
-    if args.self_loop and not args.dataset.startswith('reddit'):
-        data.graph.add_edges_from([(i,i) for i in range(len(data.graph))])
-
-    mem_ctx = mx.cpu()
-
-    features = mx.nd.array(data.features, ctx=mem_ctx)
-    labels = mx.nd.array(data.labels, ctx=mem_ctx)
-    train_mask = mx.nd.array(data.train_mask, ctx=mem_ctx)
-    val_mask = mx.nd.array(data.val_mask, ctx=mem_ctx)
-    test_mask = mx.nd.array(data.test_mask, ctx=mem_ctx)
-    n_classes = data.num_labels
-
-    n_train_samples = train_mask.sum().asscalar()
-    n_val_samples = val_mask.sum().asscalar()
-    n_test_samples = test_mask.sum().asscalar()
-
-    print("""----Data statistics------'
-      #Edges %d
-      #Classes %d
-      #Train samples %d
-      #Val samples %d
-      #Test samples %d""" %
-          (n_edges, n_classes,
-              n_train_samples,
-              n_val_samples,
-              n_test_samples))
-
-    # create GCN model
-    print('graph name: ' + graph_name)
-    g = dgl.contrib.graph_store.create_graph_store_server(data.graph, graph_name, "shared_mem",
-                                                          args.num_workers, False, edge_dir='in')
-    g.ndata['features'] = features
-    g.ndata['labels'] = labels
-    g.ndata['train_mask'] = train_mask
-    g.ndata['val_mask'] = val_mask
-    g.ndata['test_mask'] = test_mask
-    g.run()
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='GCN')
-    register_data_args(parser)
-    parser.add_argument("--graph-file", type=str, default="",
-            help="graph file")
-    parser.add_argument("--num-feats", type=int, default=100,
-            help="the number of features")
-    parser.add_argument("--self-loop", action='store_true',
-            help="graph self-loop (default=False)")
-    parser.add_argument("--num-workers", type=int, default=1,
-            help="the number of workers")
-    args = parser.parse_args()
-
-    main(args)
--- a/examples/mxnet/_deprecated/sampling/train.py
+++ b/examples/mxnet/_deprecated/sampling/train.py
-import argparse, time, math
-import numpy as np
-import mxnet as mx
-from mxnet import gluon
-from functools import partial
-import dgl
-import dgl.function as fn
-from dgl import DGLGraph
-from dgl.data import register_data_args, load_data
-from gcn_ns_sc import gcn_ns_train
-from gcn_cv_sc import gcn_cv_train
-from graphsage_cv import graphsage_cv_train
-
-
-def main(args):
-    # load and preprocess dataset
-    data = load_data(args)
-
-    if args.gpu >= 0:
-        ctx = mx.gpu(args.gpu)
-    else:
-        ctx = mx.cpu()
-
-    if args.self_loop and not args.dataset.startswith('reddit'):
-        data.graph.add_edges_from([(i,i) for i in range(len(data.graph))])
-
-    train_nid = mx.nd.array(np.nonzero(data.train_mask)[0]).astype(np.int64)
-    test_nid = mx.nd.array(np.nonzero(data.test_mask)[0]).astype(np.int64)
-
-    features = mx.nd.array(data.features)
-    labels = mx.nd.array(data.labels)
-    train_mask = mx.nd.array(data.train_mask)
-    val_mask = mx.nd.array(data.val_mask)
-    test_mask = mx.nd.array(data.test_mask)
-    in_feats = features.shape[1]
-    n_classes = data.num_labels
-    n_edges = data.graph.number_of_edges()
-
-    n_train_samples = train_mask.sum().asscalar()
-    n_val_samples = val_mask.sum().asscalar()
-    n_test_samples = test_mask.sum().asscalar()
-
-    print("""----Data statistics------'
-      #Edges %d
-      #Classes %d
-      #Train samples %d
-      #Val samples %d
-      #Test samples %d""" %
-          (n_edges, n_classes,
-              n_train_samples,
-              n_val_samples,
-              n_test_samples))
-
-    # create GCN model
-    g = dgl.DGLGraph(data.graph, readonly=True)
-    g.ndata['features'] = features
-    g.ndata['labels'] = labels
-
-    if args.model == "gcn_ns":
-        gcn_ns_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples)
-    elif args.model == "gcn_cv":
-        gcn_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, False)
-    elif args.model == "graphsage_cv":
-        graphsage_cv_train(g, ctx, args, n_classes, train_nid, test_nid, n_test_samples, False)
-    else:
-        print("unknown model. Please choose from gcn_ns, gcn_cv, graphsage_cv")
-
-
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='GCN')
-    register_data_args(parser)
-    parser.add_argument("--model", type=str,
-                        help="select a model. Valid models: gcn_ns, gcn_cv, graphsage_cv")
-    parser.add_argument("--dropout", type=float, default=0.5,
-            help="dropout probability")
-    parser.add_argument("--gpu", type=int, default=-1,
-            help="gpu")
-    parser.add_argument("--lr", type=float, default=3e-2,
-            help="learning rate")
-    parser.add_argument("--n-epochs", type=int, default=200,
-            help="number of training epochs")
-    parser.add_argument("--batch-size", type=int, default=1000,
-            help="batch size")
-    parser.add_argument("--test-batch-size", type=int, default=1000,
-            help="test batch size")
-    parser.add_argument("--num-neighbors", type=int, default=3,
-            help="number of neighbors to be sampled")
-    parser.add_argument("--n-hidden", type=int, default=16,
-            help="number of hidden gcn units")
-    parser.add_argument("--n-layers", type=int, default=1,
-            help="number of hidden gcn layers")
-    parser.add_argument("--self-loop", action='store_true',
-            help="graph self-loop (default=False)")
-    parser.add_argument("--weight-decay", type=float, default=5e-4,
-            help="Weight for L2 loss")
-    parser.add_argument("--nworkers", type=int, default=1,
-            help="number of workers")
-    args = parser.parse_args()
-
-    print(args)
-
-    main(args)
--- a/examples/pytorch/_deprecated/adaptive_sampling/README.md
+++ b/examples/pytorch/_deprecated/adaptive_sampling/README.md
-# Adaptive sampling for graph representation learning
-
-This is dgl implementation of [Adaptive Sampling Towards Fast Graph Representation Learning](https://arxiv.org/abs/1809.05343).
-
-The authors' implementation can be found [here](https://github.com/huangwb/AS-GCNN).
-
-## Performance
-
-Test accuracy on cora dataset achieves 0.84 around 250 epochs when sample size is set to 256 for each layer.
-
-## Usage
-
-`python adaptive_sampling.py --batch_size 20  --node_per_layer 40`
\ No newline at end of file