Unverified Commit e452179c authored by Mufei Li's avatar Mufei Li Committed by GitHub
Browse files

[Deprecation] Dataset Attributes (#4666)



* Update from master (#4584)

* [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430)

* Add refactors for multi-gpu and full-graph example

* Fix format

* Update

* Update

* Update

* [Cleanup] Remove async_transferer (#4505)

* Remove async_transferer

* remove test

* Remove AsyncTransferer
Co-authored-by: default avatarXin Yao <xiny@nvidia.com>
Co-authored-by: default avatarXin Yao <yaox12@outlook.com>

* [Cleanup] Remove duplicate entries of CUB submodule   (issue# 4395) (#4499)

* remove third_part/cub

* remove from third_party
Co-authored-by: default avatarIsrat Nisa <nisisrat@amazon.com>
Co-authored-by: default avatarXin Yao <xiny@nvidia.com>

* [Bug] Enable turn on/off libxsmm at runtime (#4455)

* enable turn on/off libxsmm at runtime by adding a global config and related API
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>

* [Feature] Unify the cuda stream used in core library (#4480)

* Use an internal cuda stream for CopyDataFromTo

* small fix white space

* Fix to compile

* Make stream optional in copydata for compile

* fix lint issue

* Update cub functions to use internal stream

* Lint check

* Update CopyTo/CopyFrom/CopyFromTo to use internal stream

* Address comments

* Fix backward CUDA stream

* Avoid overloading CopyFromTo()

* Minor comment update

* Overload copydatafromto in cuda device api
Co-authored-by: default avatarxiny <xiny@nvidia.com>

* [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389)

* * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph
* Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information

* * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing

* * Added test to ensure dgl.remove_self_loop function correctly updates batch information

* * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph

* * Formatting

* * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph

* * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass
* Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed

* * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments

* * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test

* * Added doc comments for new parameters

* * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph
* Added test cases for more than k coincident points

* * Updated doc comments for output_batch parameters for clarity

* * Linter formatting fixes

* * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu

* * Rewording in doc comments

* * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>

* [CI] only known devs are authorized to trigger CI (#4518)

* [CI] only known devs are authorized to trigger CI

* fix if author is null

* add comments

* [Readability] Auto fix setup.py and update-version.py (#4446)

* Auto fix update-version

* Auto fix setup.py

* Auto fix update-version

* Auto fix setup.py

* [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438)

* Update distributed-preprocessing.rst

* Update
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* fix unpinning when tensoradaptor is not available (#4450)

* [Doc] fix print issue in tutorial (#4459)

* [Example][Refactor] Refactor RGCN example (#4327)

* Refactor full graph entity classification

* Refactor rgcn with sampling

* README update

* Update

* Results update

* Respect default setting of self_loop=false in entity.py

* Update

* Update README

* Update for multi-gpu

* Update

* [doc] fix invalid link in user guide (#4468)

* [Example] directional_GSN for ogbg-molpcba (#4405)

* version-1

* version-2

* version-3

* update examples/README

* Update .gitignore

* update performance in README, delete scripts

* 1st approving review

* 2nd approving review
Co-authored-by: default avatarMufei Li <mufeili1996@gmail.com>

* Clarify the message name, which is 'm'. (#4462)
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: default avatarRhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

* [Refactor] Auto fix view.py. (#4461)
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>

* [Example] SEAL for OGBL (#4291)

* [Example] SEAL for OGBL

* update index

* update

* fix readme typo

* add seal sampler

* modify set ops

* prefetch

* efficiency test

* update

* optimize

* fix ScatterAdd dtype issue

* update sampler style

* update
Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>

* [CI] use https instead of http (#4488)

* [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487)

* [BugFix] fix crash due to incorrect dtype in dgl.to_block()

* fix test failure in TF

* [Feature] Make TensorAdapter Stream Aware (#4472)

* Allocate tensors in DGL's current stream

* make tensoradaptor stream-aware

* replace TAemtpy with cpu allocator

* fix typo

* try fix cpu allocation

* clean header

* redirect AllocDataSpace as well

* resolve comments

* [Build][Doc] Specify the sphinx version (#4465)
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>

* reformat

* reformat

* Auto fix update-version

* Auto fix setup.py

* reformat

* reformat
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: default avatarRhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: default avatarMufei Li <mufeili1996@gmail.com>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: default avatarXin Yao <xiny@nvidia.com>
Co-authored-by: default avatarChang Liu <chang.liu@utexas.edu>
Co-authored-by: default avatarZhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: default avatarrudongyu <ru_dongyu@outlook.com>
Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>

* Move mock version of dgl_sparse library to DGL main repo (#4524)

* init

* Add api doc for sparse library

* support op btwn matrices with differnt sparsity

* Fixed docstring

* addresses comments

* lint check

* change keyword format to fmt
Co-authored-by: default avatarIsrat Nisa <nisisrat@amazon.com>

* [DistPart] expose timeout config for process group (#4532)

* [DistPart] expose timeout config for process group

* refine code

* Update tools/distpartitioning/data_proc_pipeline.py
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>

* [Feature] Import PyTorch's CUDA stream management (#4503)

* add set_stream

* add .record_stream for NDArray and HeteroGraph

* refactor dgl stream Python APIs

* test record_stream

* add unit test for record stream

* use pytorch's stream

* fix lint

* fix cpu build

* address comments

* address comments

* add record stream tests for dgl.graph

* record frames and update dataloder

* add docstring

* update frame

* add backend check for record_stream

* remove CUDAThreadEntry::stream

* record stream for newly created formats

* fix bug

* fix cpp test

* fix None c_void_p to c_handle

* [examples]educe memory consumption (#4558)

* [examples]educe memory consumption

* reffine help message

* refine

* [Feature][REVIEW] Enable DGL cugaph nightly CI  (#4525)

* Added cugraph nightly scripts

* Removed nvcr.io//nvidia/pytorch:22.04-py3 reference
Co-authored-by: default avatarRhett Ying <85214957+Rhett-Ying@users.noreply.github.com>

* Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI  (#4525)" (#4563)

This reverts commit ec171c64

.

* [Misc] Add flake8 lint workflow. (#4566)

* Add pyproject.toml for autopep8.

* Add pyproject.toml for autopep8.

* Add flake8 annotation in workflow.

* remove

* add

* clean up
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Try use official pylint workflow. (#4568)

* polish update_version

* update pylint workflow.

* add

* revert.
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [CI] refine stage logic (#4565)

* [CI] refine stage logic

* refine

* refine

* remove (#4570)
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Add Pylint workflow for flake8. (#4571)

* remove

* Add pylint.
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Update the python version in Pylint workflow for flake8. (#4572)

* remove

* Add pylint.

* Change the python version for pylint.
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint. (#4574)
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Misc] Use another workflow. (#4575)

* Update pylint.

* Use another workflow.
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint. (#4576)
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* Update pylint.yml

* Update pylint.yml

* Delete pylint.yml

* [Misc]Add pyproject.toml for autopep8 & black. (#4543)

* Add pyproject.toml for autopep8.

* Add pyproject.toml for autopep8.
Co-authored-by: default avatarSteve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>

* [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454)

* rename `DLContext` to `DGLContext`

* rename `kDLGPU` to `kDLCUDA`

* replace DLTensor with DGLArray

* fix linting

* Unify DGLType and DLDataType to DGLDataType

* Fix FFI

* rename DLDeviceType to DGLDeviceType

* decouple dlpack from the core library

* fix bug

* fix lint

* fix merge

* fix build

* address comments

* rename dl_converter to dlpack_convert

* remove redundant comments
Co-authored-by: default avatarChang Liu <chang.liu@utexas.edu>
Co-authored-by: default avatarnv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: default avatarXin Yao <xiny@nvidia.com>
Co-authored-by: default avatarXin Yao <yaox12@outlook.com>
Co-authored-by: default avatarIsrat Nisa <neesha295@gmail.com>
Co-authored-by: default avatarIsrat Nisa <nisisrat@amazon.com>
Co-authored-by: default avatarpeizhou001 <110809584+peizhou001@users.noreply.github.com>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>
Co-authored-by: default avatarndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com>
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: default avatarRhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: default avatarHongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: default avatarZhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: default avatarrudongyu <ru_dongyu@outlook.com>
Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>
Co-authored-by: default avatarVibhu Jawa <vibhujawa@gmail.com>

* [Deprecation] Dataset Attributes (#4546)

* Update

* CI

* CI

* Update
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* [Example] Bug Fix (#4665)

* Update

* CI

* CI

* Update

* Update
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>

* Update
Co-authored-by: default avatarChang Liu <chang.liu@utexas.edu>
Co-authored-by: default avatarnv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com>
Co-authored-by: default avatarXin Yao <xiny@nvidia.com>
Co-authored-by: default avatarXin Yao <yaox12@outlook.com>
Co-authored-by: default avatarIsrat Nisa <neesha295@gmail.com>
Co-authored-by: default avatarIsrat Nisa <nisisrat@amazon.com>
Co-authored-by: default avatarpeizhou001 <110809584+peizhou001@users.noreply.github.com>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal>
Co-authored-by: default avatarndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com>
Co-authored-by: default avatarMinjie Wang <wmjlyjemaine@gmail.com>
Co-authored-by: default avatarRhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
Co-authored-by: default avatarHongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal>
Co-authored-by: default avatarZhiteng Li <55398076+ZHITENGLI@users.noreply.github.com>
Co-authored-by: default avatarrudongyu <ru_dongyu@outlook.com>
Co-authored-by: default avatarQuan Gan <coin2028@hotmail.com>
Co-authored-by: default avatarVibhu Jawa <vibhujawa@gmail.com>
parent f846d902
...@@ -23,18 +23,18 @@ def main(args): ...@@ -23,18 +23,18 @@ def main(args):
# load and preprocess dataset # load and preprocess dataset
data = load_data(args) data = load_data(args)
g = data[0] g = data[0]
features = torch.FloatTensor(data.features) features = torch.FloatTensor(g.ndata['feat'])
labels = torch.LongTensor(data.labels) labels = torch.LongTensor(g.ndata['label'])
if hasattr(torch, 'BoolTensor'): if hasattr(torch, 'BoolTensor'):
train_mask = torch.BoolTensor(data.train_mask) train_mask = torch.BoolTensor(g.ndata['train_mask'])
val_mask = torch.BoolTensor(data.val_mask) val_mask = torch.BoolTensor(g.ndata['val_mask'])
test_mask = torch.BoolTensor(data.test_mask) test_mask = torch.BoolTensor(g.ndata['test_mask'])
else: else:
train_mask = torch.ByteTensor(data.train_mask) train_mask = torch.ByteTensor(g.ndata['train_mask'])
val_mask = torch.ByteTensor(data.val_mask) val_mask = torch.ByteTensor(g.ndata['val_mask'])
test_mask = torch.ByteTensor(data.test_mask) test_mask = torch.ByteTensor(g.ndata['test_mask'])
in_feats = features.shape[1] in_feats = features.shape[1]
n_classes = data.num_labels n_classes = data.num_classes
n_edges = g.number_of_edges() n_edges = g.number_of_edges()
if args.gpu < 0: if args.gpu < 0:
...@@ -130,7 +130,7 @@ def main(args): ...@@ -130,7 +130,7 @@ def main(args):
loss = F.nll_loss(preds[train_mask], labels[train_mask]) loss = F.nll_loss(preds[train_mask], labels[train_mask])
loss.backward() loss.backward()
classifier_optimizer.step() classifier_optimizer.step()
if epoch >= 3: if epoch >= 3:
dur.append(time.time() - t0) dur.append(time.time() - t0)
...@@ -171,5 +171,5 @@ if __name__ == '__main__': ...@@ -171,5 +171,5 @@ if __name__ == '__main__':
parser.set_defaults(self_loop=False) parser.set_defaults(self_loop=False)
args = parser.parse_args() args = parser.parse_args()
print(args) print(args)
main(args) main(args)
...@@ -251,31 +251,6 @@ class CitationGraphDataset(DGLBuiltinDataset): ...@@ -251,31 +251,6 @@ class CitationGraphDataset(DGLBuiltinDataset):
We preserve these properties for compatability. We preserve these properties for compatability.
""" """
@property
def train_mask(self):
deprecate_property('dataset.train_mask', 'g.ndata[\'train_mask\']')
return F.asnumpy(self._g.ndata['train_mask'])
@property
def val_mask(self):
deprecate_property('dataset.val_mask', 'g.ndata[\'val_mask\']')
return F.asnumpy(self._g.ndata['val_mask'])
@property
def test_mask(self):
deprecate_property('dataset.test_mask', 'g.ndata[\'test_mask\']')
return F.asnumpy(self._g.ndata['test_mask'])
@property
def labels(self):
deprecate_property('dataset.label', 'g.ndata[\'label\']')
return F.asnumpy(self._g.ndata['label'])
@property
def features(self):
deprecate_property('dataset.feat', 'g.ndata[\'feat\']')
return self._g.ndata['feat']
@property @property
def reverse_edge(self): def reverse_edge(self):
return self._reverse_edge return self._reverse_edge
...@@ -306,43 +281,6 @@ def _sample_mask(idx, l): ...@@ -306,43 +281,6 @@ def _sample_mask(idx, l):
class CoraGraphDataset(CitationGraphDataset): class CoraGraphDataset(CitationGraphDataset):
r""" Cora citation network dataset. r""" Cora citation network dataset.
.. deprecated:: 0.5.0
- ``graph`` is deprecated, it is replaced by:
>>> dataset = CoraGraphDataset()
>>> graph = dataset[0]
- ``train_mask`` is deprecated, it is replaced by:
>>> dataset = CoraGraphDataset()
>>> graph = dataset[0]
>>> train_mask = graph.ndata['train_mask']
- ``val_mask`` is deprecated, it is replaced by:
>>> dataset = CoraGraphDataset()
>>> graph = dataset[0]
>>> val_mask = graph.ndata['val_mask']
- ``test_mask`` is deprecated, it is replaced by:
>>> dataset = CoraGraphDataset()
>>> graph = dataset[0]
>>> test_mask = graph.ndata['test_mask']
- ``labels`` is deprecated, it is replaced by:
>>> dataset = CoraGraphDataset()
>>> graph = dataset[0]
>>> labels = graph.ndata['label']
- ``feat`` is deprecated, it is replaced by:
>>> dataset = CoraGraphDataset()
>>> graph = dataset[0]
>>> feat = graph.ndata['feat']
Nodes mean paper and edges mean citation Nodes mean paper and edges mean citation
relationships. Each node has a predefined relationships. Each node has a predefined
feature with 1433 dimensions. The dataset is feature with 1433 dimensions. The dataset is
...@@ -383,18 +321,6 @@ class CoraGraphDataset(CitationGraphDataset): ...@@ -383,18 +321,6 @@ class CoraGraphDataset(CitationGraphDataset):
---------- ----------
num_classes: int num_classes: int
Number of label classes Number of label classes
graph: networkx.DiGraph
Graph structure
train_mask: numpy.ndarray
Mask of training nodes
val_mask: numpy.ndarray
Mask of validation nodes
test_mask: numpy.ndarray
Mask of test nodes
labels: numpy.ndarray
Ground truth labels of each node
features: Tensor
Node features
Notes Notes
----- -----
...@@ -454,43 +380,6 @@ class CoraGraphDataset(CitationGraphDataset): ...@@ -454,43 +380,6 @@ class CoraGraphDataset(CitationGraphDataset):
class CiteseerGraphDataset(CitationGraphDataset): class CiteseerGraphDataset(CitationGraphDataset):
r""" Citeseer citation network dataset. r""" Citeseer citation network dataset.
.. deprecated:: 0.5.0
- ``graph`` is deprecated, it is replaced by:
>>> dataset = CiteseerGraphDataset()
>>> graph = dataset[0]
- ``train_mask`` is deprecated, it is replaced by:
>>> dataset = CiteseerGraphDataset()
>>> graph = dataset[0]
>>> train_mask = graph.ndata['train_mask']
- ``val_mask`` is deprecated, it is replaced by:
>>> dataset = CiteseerGraphDataset()
>>> graph = dataset[0]
>>> val_mask = graph.ndata['val_mask']
- ``test_mask`` is deprecated, it is replaced by:
>>> dataset = CiteseerGraphDataset()
>>> graph = dataset[0]
>>> test_mask = graph.ndata['test_mask']
- ``labels`` is deprecated, it is replaced by:
>>> dataset = CiteseerGraphDataset()
>>> graph = dataset[0]
>>> labels = graph.ndata['label']
- ``feat`` is deprecated, it is replaced by:
>>> dataset = CiteseerGraphDataset()
>>> graph = dataset[0]
>>> feat = graph.ndata['feat']
Nodes mean scientific publications and edges Nodes mean scientific publications and edges
mean citation relationships. Each node has a mean citation relationships. Each node has a
predefined feature with 3703 dimensions. The predefined feature with 3703 dimensions. The
...@@ -531,18 +420,6 @@ class CiteseerGraphDataset(CitationGraphDataset): ...@@ -531,18 +420,6 @@ class CiteseerGraphDataset(CitationGraphDataset):
---------- ----------
num_classes: int num_classes: int
Number of label classes Number of label classes
graph: networkx.DiGraph
Graph structure
train_mask: numpy.ndarray
Mask of training nodes
val_mask: numpy.ndarray
Mask of validation nodes
test_mask: numpy.ndarray
Mask of test nodes
labels: numpy.ndarray
Ground truth labels of each node
features: Tensor
Node features
Notes Notes
----- -----
...@@ -605,43 +482,6 @@ class CiteseerGraphDataset(CitationGraphDataset): ...@@ -605,43 +482,6 @@ class CiteseerGraphDataset(CitationGraphDataset):
class PubmedGraphDataset(CitationGraphDataset): class PubmedGraphDataset(CitationGraphDataset):
r""" Pubmed citation network dataset. r""" Pubmed citation network dataset.
.. deprecated:: 0.5.0
- ``graph`` is deprecated, it is replaced by:
>>> dataset = PubmedGraphDataset()
>>> graph = dataset[0]
- ``train_mask`` is deprecated, it is replaced by:
>>> dataset = PubmedGraphDataset()
>>> graph = dataset[0]
>>> train_mask = graph.ndata['train_mask']
- ``val_mask`` is deprecated, it is replaced by:
>>> dataset = PubmedGraphDataset()
>>> graph = dataset[0]
>>> val_mask = graph.ndata['val_mask']
- ``test_mask`` is deprecated, it is replaced by:
>>> dataset = PubmedGraphDataset()
>>> graph = dataset[0]
>>> test_mask = graph.ndata['test_mask']
- ``labels`` is deprecated, it is replaced by:
>>> dataset = PubmedGraphDataset()
>>> graph = dataset[0]
>>> labels = graph.ndata['label']
- ``feat`` is deprecated, it is replaced by:
>>> dataset = PubmedGraphDataset()
>>> graph = dataset[0]
>>> feat = graph.ndata['feat']
Nodes mean scientific publications and edges Nodes mean scientific publications and edges
mean citation relationships. Each node has a mean citation relationships. Each node has a
predefined feature with 500 dimensions. The predefined feature with 500 dimensions. The
...@@ -682,18 +522,6 @@ class PubmedGraphDataset(CitationGraphDataset): ...@@ -682,18 +522,6 @@ class PubmedGraphDataset(CitationGraphDataset):
---------- ----------
num_classes: int num_classes: int
Number of label classes Number of label classes
graph: networkx.DiGraph
Graph structure
train_mask: numpy.ndarray
Mask of training nodes
val_mask: numpy.ndarray
Mask of validation nodes
test_mask: numpy.ndarray
Mask of test nodes
labels: numpy.ndarray
Ground truth labels of each node
features: Tensor
Node features
Notes Notes
----- -----
......
...@@ -106,11 +106,6 @@ class GNNBenchmarkDataset(DGLBuiltinDataset): ...@@ -106,11 +106,6 @@ class GNNBenchmarkDataset(DGLBuiltinDataset):
"""Number of classes.""" """Number of classes."""
raise NotImplementedError raise NotImplementedError
@property
def data(self):
deprecate_property('dataset.data', 'dataset[0]')
return self._data
def __getitem__(self, idx): def __getitem__(self, idx):
r""" Get graph by index r""" Get graph by index
...@@ -142,13 +137,6 @@ class GNNBenchmarkDataset(DGLBuiltinDataset): ...@@ -142,13 +137,6 @@ class GNNBenchmarkDataset(DGLBuiltinDataset):
class CoraFullDataset(GNNBenchmarkDataset): class CoraFullDataset(GNNBenchmarkDataset):
r"""CORA-Full dataset for node classification task. r"""CORA-Full dataset for node classification task.
.. deprecated:: 0.5.0
- ``data`` is deprecated, it is repalced by:
>>> dataset = CoraFullDataset()
>>> graph = dataset[0]
Extended Cora dataset. Nodes represent paper and edges represent citations. Extended Cora dataset. Nodes represent paper and edges represent citations.
Reference: `<https://github.com/shchur/gnn-benchmark#datasets>`_ Reference: `<https://github.com/shchur/gnn-benchmark#datasets>`_
...@@ -179,8 +167,6 @@ class CoraFullDataset(GNNBenchmarkDataset): ...@@ -179,8 +167,6 @@ class CoraFullDataset(GNNBenchmarkDataset):
---------- ----------
num_classes : int num_classes : int
Number of classes for each node. Number of classes for each node.
data : list
A list of DGLGraph objects
Examples Examples
-------- --------
...@@ -211,13 +197,6 @@ class CoraFullDataset(GNNBenchmarkDataset): ...@@ -211,13 +197,6 @@ class CoraFullDataset(GNNBenchmarkDataset):
class CoauthorCSDataset(GNNBenchmarkDataset): class CoauthorCSDataset(GNNBenchmarkDataset):
r""" 'Computer Science (CS)' part of the Coauthor dataset for node classification task. r""" 'Computer Science (CS)' part of the Coauthor dataset for node classification task.
.. deprecated:: 0.5.0
- ``data`` is deprecated, it is repalced by:
>>> dataset = CoauthorCSDataset()
>>> graph = dataset[0]
Coauthor CS and Coauthor Physics are co-authorship graphs based on the Microsoft Academic Graph Coauthor CS and Coauthor Physics are co-authorship graphs based on the Microsoft Academic Graph
from the KDD Cup 2016 challenge. Here, nodes are authors, that are connected by an edge if they from the KDD Cup 2016 challenge. Here, nodes are authors, that are connected by an edge if they
co-authored a paper; node features represent paper keywords for each author’s papers, and class co-authored a paper; node features represent paper keywords for each author’s papers, and class
...@@ -251,8 +230,6 @@ class CoauthorCSDataset(GNNBenchmarkDataset): ...@@ -251,8 +230,6 @@ class CoauthorCSDataset(GNNBenchmarkDataset):
---------- ----------
num_classes : int num_classes : int
Number of classes for each node. Number of classes for each node.
data : list
A list of DGLGraph objects
Examples Examples
-------- --------
...@@ -283,13 +260,6 @@ class CoauthorCSDataset(GNNBenchmarkDataset): ...@@ -283,13 +260,6 @@ class CoauthorCSDataset(GNNBenchmarkDataset):
class CoauthorPhysicsDataset(GNNBenchmarkDataset): class CoauthorPhysicsDataset(GNNBenchmarkDataset):
r""" 'Physics' part of the Coauthor dataset for node classification task. r""" 'Physics' part of the Coauthor dataset for node classification task.
.. deprecated:: 0.5.0
- ``data`` is deprecated, it is repalced by:
>>> dataset = CoauthorPhysicsDataset()
>>> graph = dataset[0]
Coauthor CS and Coauthor Physics are co-authorship graphs based on the Microsoft Academic Graph Coauthor CS and Coauthor Physics are co-authorship graphs based on the Microsoft Academic Graph
from the KDD Cup 2016 challenge. Here, nodes are authors, that are connected by an edge if they from the KDD Cup 2016 challenge. Here, nodes are authors, that are connected by an edge if they
co-authored a paper; node features represent paper keywords for each author’s papers, and class co-authored a paper; node features represent paper keywords for each author’s papers, and class
...@@ -323,8 +293,6 @@ class CoauthorPhysicsDataset(GNNBenchmarkDataset): ...@@ -323,8 +293,6 @@ class CoauthorPhysicsDataset(GNNBenchmarkDataset):
---------- ----------
num_classes : int num_classes : int
Number of classes for each node. Number of classes for each node.
data : list
A list of DGLGraph objects
Examples Examples
-------- --------
...@@ -355,13 +323,6 @@ class CoauthorPhysicsDataset(GNNBenchmarkDataset): ...@@ -355,13 +323,6 @@ class CoauthorPhysicsDataset(GNNBenchmarkDataset):
class AmazonCoBuyComputerDataset(GNNBenchmarkDataset): class AmazonCoBuyComputerDataset(GNNBenchmarkDataset):
r""" 'Computer' part of the AmazonCoBuy dataset for node classification task. r""" 'Computer' part of the AmazonCoBuy dataset for node classification task.
.. deprecated:: 0.5.0
- ``data`` is deprecated, it is repalced by:
>>> dataset = AmazonCoBuyComputerDataset()
>>> graph = dataset[0]
Amazon Computers and Amazon Photo are segments of the Amazon co-purchase graph [McAuley et al., 2015], Amazon Computers and Amazon Photo are segments of the Amazon co-purchase graph [McAuley et al., 2015],
where nodes represent goods, edges indicate that two goods are frequently bought together, node where nodes represent goods, edges indicate that two goods are frequently bought together, node
features are bag-of-words encoded product reviews, and class labels are given by the product category. features are bag-of-words encoded product reviews, and class labels are given by the product category.
...@@ -394,8 +355,6 @@ class AmazonCoBuyComputerDataset(GNNBenchmarkDataset): ...@@ -394,8 +355,6 @@ class AmazonCoBuyComputerDataset(GNNBenchmarkDataset):
---------- ----------
num_classes : int num_classes : int
Number of classes for each node. Number of classes for each node.
data : list
A list of DGLGraph objects
Examples Examples
-------- --------
...@@ -426,13 +385,6 @@ class AmazonCoBuyComputerDataset(GNNBenchmarkDataset): ...@@ -426,13 +385,6 @@ class AmazonCoBuyComputerDataset(GNNBenchmarkDataset):
class AmazonCoBuyPhotoDataset(GNNBenchmarkDataset): class AmazonCoBuyPhotoDataset(GNNBenchmarkDataset):
r"""AmazonCoBuy dataset for node classification task. r"""AmazonCoBuy dataset for node classification task.
.. deprecated:: 0.5.0
- ``data`` is deprecated, it is repalced by:
>>> dataset = AmazonCoBuyPhotoDataset()
>>> graph = dataset[0]
Amazon Computers and Amazon Photo are segments of the Amazon co-purchase graph [McAuley et al., 2015], Amazon Computers and Amazon Photo are segments of the Amazon co-purchase graph [McAuley et al., 2015],
where nodes represent goods, edges indicate that two goods are frequently bought together, node where nodes represent goods, edges indicate that two goods are frequently bought together, node
features are bag-of-words encoded product reviews, and class labels are given by the product category. features are bag-of-words encoded product reviews, and class labels are given by the product category.
...@@ -465,8 +417,6 @@ class AmazonCoBuyPhotoDataset(GNNBenchmarkDataset): ...@@ -465,8 +417,6 @@ class AmazonCoBuyPhotoDataset(GNNBenchmarkDataset):
---------- ----------
num_classes : int num_classes : int
Number of classes for each node. Number of classes for each node.
data : list
A list of DGLGraph objects
Examples Examples
-------- --------
......
...@@ -14,13 +14,6 @@ __all__ = ['KarateClubDataset', 'KarateClub'] ...@@ -14,13 +14,6 @@ __all__ = ['KarateClubDataset', 'KarateClub']
class KarateClubDataset(DGLDataset): class KarateClubDataset(DGLDataset):
r""" Karate Club dataset for Node Classification r""" Karate Club dataset for Node Classification
.. deprecated:: 0.5.0
- ``data`` is deprecated, it is replaced by:
>>> dataset = KarateClubDataset()
>>> g = dataset[0]
Zachary's karate club is a social network of a university Zachary's karate club is a social network of a university
karate club, described in the paper "An Information Flow karate club, described in the paper "An Information Flow
Model for Conflict and Fission in Small Groups" by Wayne W. Zachary. Model for Conflict and Fission in Small Groups" by Wayne W. Zachary.
...@@ -45,8 +38,6 @@ class KarateClubDataset(DGLDataset): ...@@ -45,8 +38,6 @@ class KarateClubDataset(DGLDataset):
---------- ----------
num_classes : int num_classes : int
Number of node classes Number of node classes
data : list
A list of :class:`dgl.DGLGraph` objects
Examples Examples
-------- --------
...@@ -73,11 +64,6 @@ class KarateClubDataset(DGLDataset): ...@@ -73,11 +64,6 @@ class KarateClubDataset(DGLDataset):
"""Number of classes.""" """Number of classes."""
return 2 return 2
@property
def data(self):
deprecate_property('dataset.data', 'dataset[0]')
return self._data
def __getitem__(self, idx): def __getitem__(self, idx):
r""" Get graph object r""" Get graph object
......
...@@ -191,21 +191,6 @@ class KnowledgeGraphDataset(DGLBuiltinDataset): ...@@ -191,21 +191,6 @@ class KnowledgeGraphDataset(DGLBuiltinDataset):
def save_name(self): def save_name(self):
return self.name + '_dgl_graph' return self.name + '_dgl_graph'
@property
def train(self):
deprecate_property('dataset.train', 'g.edata[\'train_mask\']')
return self._train
@property
def valid(self):
deprecate_property('dataset.valid', 'g.edata[\'val_mask\']')
return self._valid
@property
def test(self):
deprecate_property('dataset.test', 'g.edata[\'test_mask\']')
return self._test
def _read_dictionary(filename): def _read_dictionary(filename):
d = {} d = {}
with open(filename, 'r+') as f: with open(filename, 'r+') as f:
...@@ -344,35 +329,6 @@ def build_knowledge_graph(num_nodes, num_rels, train, valid, test, reverse=True) ...@@ -344,35 +329,6 @@ def build_knowledge_graph(num_nodes, num_rels, train, valid, test, reverse=True)
class FB15k237Dataset(KnowledgeGraphDataset): class FB15k237Dataset(KnowledgeGraphDataset):
r"""FB15k237 link prediction dataset. r"""FB15k237 link prediction dataset.
.. deprecated:: 0.5.0
- ``train`` is deprecated, it is replaced by:
>>> dataset = FB15k237Dataset()
>>> graph = dataset[0]
>>> train_mask = graph.edata['train_mask']
>>> train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
>>> src, dst = graph.find_edges(train_idx)
>>> rel = graph.edata['etype'][train_idx]
- ``valid`` is deprecated, it is replaced by:
>>> dataset = FB15k237Dataset()
>>> graph = dataset[0]
>>> val_mask = graph.edata['val_mask']
>>> val_idx = th.nonzero(val_mask, as_tuple=False).squeeze()
>>> src, dst = graph.find_edges(val_idx)
>>> rel = graph.edata['etype'][val_idx]
- ``test`` is deprecated, it is replaced by:
>>> dataset = FB15k237Dataset()
>>> graph = dataset[0]
>>> test_mask = graph.edata['test_mask']
>>> test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
>>> src, dst = graph.find_edges(test_idx)
>>> rel = graph.edata['etype'][test_idx]
FB15k-237 is a subset of FB15k where inverse FB15k-237 is a subset of FB15k where inverse
relations are removed. When creating the dataset, relations are removed. When creating the dataset,
a reverse edge with reversed relation types are a reverse edge with reversed relation types are
...@@ -411,12 +367,6 @@ class FB15k237Dataset(KnowledgeGraphDataset): ...@@ -411,12 +367,6 @@ class FB15k237Dataset(KnowledgeGraphDataset):
Number of nodes Number of nodes
num_rels: int num_rels: int
Number of relation types Number of relation types
train: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the training graph
valid: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the validation graph
test: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the test graph
Examples Examples
---------- ----------
...@@ -484,35 +434,6 @@ class FB15k237Dataset(KnowledgeGraphDataset): ...@@ -484,35 +434,6 @@ class FB15k237Dataset(KnowledgeGraphDataset):
class FB15kDataset(KnowledgeGraphDataset): class FB15kDataset(KnowledgeGraphDataset):
r"""FB15k link prediction dataset. r"""FB15k link prediction dataset.
.. deprecated:: 0.5.0
- ``train`` is deprecated, it is replaced by:
>>> dataset = FB15kDataset()
>>> graph = dataset[0]
>>> train_mask = graph.edata['train_mask']
>>> train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(train_idx)
>>> rel = graph.edata['etype'][train_idx]
- ``valid`` is deprecated, it is replaced by:
>>> dataset = FB15kDataset()
>>> graph = dataset[0]
>>> val_mask = graph.edata['val_mask']
>>> val_idx = th.nonzero(val_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(val_idx)
>>> rel = graph.edata['etype'][val_idx]
- ``test`` is deprecated, it is replaced by:
>>> dataset = FB15kDataset()
>>> graph = dataset[0]
>>> test_mask = graph.edata['test_mask']
>>> test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(test_idx)
>>> rel = graph.edata['etype'][test_idx]
The FB15K dataset was introduced in `Translating Embeddings for Modeling The FB15K dataset was introduced in `Translating Embeddings for Modeling
Multi-relational Data <http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf>`_. Multi-relational Data <http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf>`_.
It is a subset of Freebase which contains about It is a subset of Freebase which contains about
...@@ -554,12 +475,6 @@ class FB15kDataset(KnowledgeGraphDataset): ...@@ -554,12 +475,6 @@ class FB15kDataset(KnowledgeGraphDataset):
Number of nodes Number of nodes
num_rels: int num_rels: int
Number of relation types Number of relation types
train: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the training graph
valid: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the validation graph
test: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the test graph
Examples Examples
---------- ----------
...@@ -627,35 +542,6 @@ class FB15kDataset(KnowledgeGraphDataset): ...@@ -627,35 +542,6 @@ class FB15kDataset(KnowledgeGraphDataset):
class WN18Dataset(KnowledgeGraphDataset): class WN18Dataset(KnowledgeGraphDataset):
r""" WN18 link prediction dataset. r""" WN18 link prediction dataset.
.. deprecated:: 0.5.0
- ``train`` is deprecated, it is replaced by:
>>> dataset = WN18Dataset()
>>> graph = dataset[0]
>>> train_mask = graph.edata['train_mask']
>>> train_idx = th.nonzero(train_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(train_idx)
>>> rel = graph.edata['etype'][train_idx]
- ``valid`` is deprecated, it is replaced by:
>>> dataset = WN18Dataset()
>>> graph = dataset[0]
>>> val_mask = graph.edata['val_mask']
>>> val_idx = th.nonzero(val_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(val_idx)
>>> rel = graph.edata['etype'][val_idx]
- ``test`` is deprecated, it is replaced by:
>>> dataset = WN18Dataset()
>>> graph = dataset[0]
>>> test_mask = graph.edata['test_mask']
>>> test_idx = th.nonzero(test_mask, as_tuple=False).squeeze()
>>> src, dst = graph.edges(test_idx)
>>> rel = graph.edata['etype'][test_idx]
The WN18 dataset was introduced in `Translating Embeddings for Modeling The WN18 dataset was introduced in `Translating Embeddings for Modeling
Multi-relational Data <http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf>`_. Multi-relational Data <http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf>`_.
It included the full 18 relations scraped from It included the full 18 relations scraped from
...@@ -696,12 +582,6 @@ class WN18Dataset(KnowledgeGraphDataset): ...@@ -696,12 +582,6 @@ class WN18Dataset(KnowledgeGraphDataset):
Number of nodes Number of nodes
num_rels: int num_rels: int
Number of relation types Number of relation types
train: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the training graph
valid: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the validation graph
test: numpy.ndarray
A numpy array of triplets (src, rel, dst) for the test graph
Examples Examples
---------- ----------
......
...@@ -15,48 +15,6 @@ from ..transforms import reorder_graph ...@@ -15,48 +15,6 @@ from ..transforms import reorder_graph
class RedditDataset(DGLBuiltinDataset): class RedditDataset(DGLBuiltinDataset):
r""" Reddit dataset for community detection (node classification) r""" Reddit dataset for community detection (node classification)
.. deprecated:: 0.5.0
- ``graph`` is deprecated, it is replaced by:
>>> dataset = RedditDataset()
>>> graph = dataset[0]
- ``num_labels`` is deprecated, it is replaced by:
>>> dataset = RedditDataset()
>>> num_classes = dataset.num_classes
- ``train_mask`` is deprecated, it is replaced by:
>>> dataset = RedditDataset()
>>> graph = dataset[0]
>>> train_mask = graph.ndata['train_mask']
- ``val_mask`` is deprecated, it is replaced by:
>>> dataset = RedditDataset()
>>> graph = dataset[0]
>>> val_mask = graph.ndata['val_mask']
- ``test_mask`` is deprecated, it is replaced by:
>>> dataset = RedditDataset()
>>> graph = dataset[0]
>>> test_mask = graph.ndata['test_mask']
- ``features`` is deprecated, it is replaced by:
>>> dataset = RedditDataset()
>>> graph = dataset[0]
>>> features = graph.ndata['feat']
- ``labels`` is deprecated, it is replaced by:
>>> dataset = RedditDataset()
>>> graph = dataset[0]
>>> labels = graph.ndata['label']
This is a graph dataset from Reddit posts made in the month of September, 2014. This is a graph dataset from Reddit posts made in the month of September, 2014.
The node label in this case is the community, or “subreddit”, that a post belongs to. The node label in this case is the community, or “subreddit”, that a post belongs to.
The authors sampled 50 large communities and built a post-to-post graph, connecting The authors sampled 50 large communities and built a post-to-post graph, connecting
...@@ -95,20 +53,6 @@ class RedditDataset(DGLBuiltinDataset): ...@@ -95,20 +53,6 @@ class RedditDataset(DGLBuiltinDataset):
---------- ----------
num_classes : int num_classes : int
Number of classes for each node Number of classes for each node
graph : :class:`dgl.DGLGraph`
Graph of the dataset
num_labels : int
Number of classes for each node
train_mask: numpy.ndarray
Mask of training nodes
val_mask: numpy.ndarray
Mask of validation nodes
test_mask: numpy.ndarray
Mask of test nodes
features : Tensor
Node features
labels : Tensor
Node labels
Examples Examples
-------- --------
...@@ -202,41 +146,6 @@ class RedditDataset(DGLBuiltinDataset): ...@@ -202,41 +146,6 @@ class RedditDataset(DGLBuiltinDataset):
r"""Number of classes for each node.""" r"""Number of classes for each node."""
return 41 return 41
@property
def num_labels(self):
deprecate_property('dataset.num_labels', 'dataset.num_classes')
return self.num_classes
@property
def graph(self):
deprecate_property('dataset.graph', 'dataset[0]')
return self._graph
@property
def train_mask(self):
deprecate_property('dataset.train_mask', 'graph.ndata[\'train_mask\']')
return F.asnumpy(self._graph.ndata['train_mask'])
@property
def val_mask(self):
deprecate_property('dataset.val_mask', 'graph.ndata[\'val_mask\']')
return F.asnumpy(self._graph.ndata['val_mask'])
@property
def test_mask(self):
deprecate_property('dataset.test_mask', 'graph.ndata[\'test_mask\']')
return F.asnumpy(self._graph.ndata['test_mask'])
@property
def features(self):
deprecate_property('dataset.features', 'graph.ndata[\'feat\']')
return self._graph.ndata['feat']
@property
def labels(self):
deprecate_property('dataset.labels', 'graph.ndata[\'label\']')
return self._graph.ndata['label']
def __getitem__(self, idx): def __getitem__(self, idx):
r""" Get graph by index r""" Get graph by index
......
...@@ -22,16 +22,6 @@ __all__ = ['SST', 'SSTDataset'] ...@@ -22,16 +22,6 @@ __all__ = ['SST', 'SSTDataset']
class SSTDataset(DGLBuiltinDataset): class SSTDataset(DGLBuiltinDataset):
r"""Stanford Sentiment Treebank dataset. r"""Stanford Sentiment Treebank dataset.
.. deprecated:: 0.5.0
- ``trees`` is deprecated, it is replaced by:
>>> dataset = SSTDataset()
>>> for tree in dataset:
.... # your code here
- ``num_vocabs`` is deprecated, it is replaced by ``vocab_size``.
Each sample is the constituency tree of a sentence. The leaf nodes Each sample is the constituency tree of a sentence. The leaf nodes
represent words. The word is a int value stored in the ``x`` feature field. represent words. The word is a int value stored in the ``x`` feature field.
The non-leaf node has a special value ``PAD_WORD`` in the ``x`` field. The non-leaf node has a special value ``PAD_WORD`` in the ``x`` field.
...@@ -74,16 +64,12 @@ class SSTDataset(DGLBuiltinDataset): ...@@ -74,16 +64,12 @@ class SSTDataset(DGLBuiltinDataset):
---------- ----------
vocab : OrderedDict vocab : OrderedDict
Vocabulary of the dataset Vocabulary of the dataset
trees : list
A list of DGLGraph objects
num_classes : int num_classes : int
Number of classes for each node Number of classes for each node
pretrained_emb: Tensor pretrained_emb: Tensor
Pretrained glove embedding with respect the vocabulary. Pretrained glove embedding with respect the vocabulary.
vocab_size : int vocab_size : int
The size of the vocabulary The size of the vocabulary
num_vocabs : int
The size of the vocabulary
Notes Notes
----- -----
...@@ -224,11 +210,6 @@ class SSTDataset(DGLBuiltinDataset): ...@@ -224,11 +210,6 @@ class SSTDataset(DGLBuiltinDataset):
if os.path.exists(emb_path): if os.path.exists(emb_path):
self._pretrained_emb = load_info(emb_path)['embed'] self._pretrained_emb = load_info(emb_path)['embed']
@property
def trees(self):
deprecate_property('dataset.trees', '[dataset[i] for i in len(dataset)]')
return self._trees
@property @property
def vocab(self): def vocab(self):
r""" Vocabulary r""" Vocabulary
...@@ -270,11 +251,6 @@ class SSTDataset(DGLBuiltinDataset): ...@@ -270,11 +251,6 @@ class SSTDataset(DGLBuiltinDataset):
r"""Number of graphs in the dataset.""" r"""Number of graphs in the dataset."""
return len(self._trees) return len(self._trees)
@property
def num_vocabs(self):
deprecate_property('dataset.num_vocabs', 'dataset.vocab_size')
return self.vocab_size
@property @property
def vocab_size(self): def vocab_size(self):
r"""Vocabulary size.""" r"""Vocabulary size."""
......
...@@ -18,7 +18,7 @@ using namespace cuda; ...@@ -18,7 +18,7 @@ using namespace cuda;
namespace aten { namespace aten {
namespace cuda { namespace cuda {
/*! /*!
* \brief CUDA kernel of GE-SpMM on Csr. * \brief CUDA kernel of GE-SpMM on Csr.
* \note GE-SpMM: https://arxiv.org/pdf/2007.03179.pdf * \note GE-SpMM: https://arxiv.org/pdf/2007.03179.pdf
* The grid dimension x and y are reordered for better performance. * The grid dimension x and y are reordered for better performance.
......
...@@ -97,7 +97,7 @@ from dgl.data import citation_graph as citegrh ...@@ -97,7 +97,7 @@ from dgl.data import citation_graph as citegrh
data = citegrh.load_cora() data = citegrh.load_cora()
G = data[0] G = data[0]
labels = th.tensor(data.labels) labels = th.tensor(G.ndata['label'])
# find all the nodes labeled with class 0 # find all the nodes labeled with class 0
label0_nodes = th.nonzero(labels == 0, as_tuple=False).squeeze() label0_nodes = th.nonzero(labels == 0, as_tuple=False).squeeze()
......
...@@ -303,11 +303,9 @@ import networkx as nx ...@@ -303,11 +303,9 @@ import networkx as nx
def load_cora_data(): def load_cora_data():
data = citegrh.load_cora() data = citegrh.load_cora()
features = torch.FloatTensor(data.features)
labels = torch.LongTensor(data.labels)
mask = torch.BoolTensor(data.train_mask)
g = data[0] g = data[0]
return g, features, labels, mask mask = torch.BoolTensor(g.ndata['train_mask'])
return g, g.ndata['feat'], g.ndata['label'], mask
############################################################################## ##############################################################################
# The training loop is exactly the same as in the GCN tutorial. # The training loop is exactly the same as in the GCN tutorial.
......
...@@ -16,28 +16,28 @@ Tree-LSTM in DGL ...@@ -16,28 +16,28 @@ Tree-LSTM in DGL
examples <https://github.com/dmlc/dgl/tree/master/examples>`_. examples <https://github.com/dmlc/dgl/tree/master/examples>`_.
""" """
############################################################################## ##############################################################################
# #
# In this tutorial, you learn to use Tree-LSTM networks for sentiment analysis. # In this tutorial, you learn to use Tree-LSTM networks for sentiment analysis.
# The Tree-LSTM is a generalization of long short-term memory (LSTM) networks to tree-structured network topologies. # The Tree-LSTM is a generalization of long short-term memory (LSTM) networks to tree-structured network topologies.
# #
# The Tree-LSTM structure was first introduced by Kai et. al in an ACL 2015 # The Tree-LSTM structure was first introduced by Kai et. al in an ACL 2015
# paper: `Improved Semantic Representations From Tree-Structured Long # paper: `Improved Semantic Representations From Tree-Structured Long
# Short-Term Memory Networks <https://arxiv.org/pdf/1503.00075.pdf>`__. # Short-Term Memory Networks <https://arxiv.org/pdf/1503.00075.pdf>`__.
# The core idea is to introduce syntactic information for language tasks by # The core idea is to introduce syntactic information for language tasks by
# extending the chain-structured LSTM to a tree-structured LSTM. The dependency # extending the chain-structured LSTM to a tree-structured LSTM. The dependency
# tree and constituency tree techniques are leveraged to obtain a ''latent tree''. # tree and constituency tree techniques are leveraged to obtain a ''latent tree''.
# #
# The challenge in training Tree-LSTMs is batching --- a standard # The challenge in training Tree-LSTMs is batching --- a standard
# technique in machine learning to accelerate optimization. However, since trees # technique in machine learning to accelerate optimization. However, since trees
# generally have different shapes by nature, parallization is non-trivial. # generally have different shapes by nature, parallization is non-trivial.
# DGL offers an alternative. Pool all the trees into one single graph then # DGL offers an alternative. Pool all the trees into one single graph then
# induce the message passing over them, guided by the structure of each tree. # induce the message passing over them, guided by the structure of each tree.
# #
# The task and the dataset # The task and the dataset
# ------------------------ # ------------------------
# #
# The steps here use the # The steps here use the
# `Stanford Sentiment Treebank <https://nlp.stanford.edu/sentiment/>`__ in # `Stanford Sentiment Treebank <https://nlp.stanford.edu/sentiment/>`__ in
# ``dgl.data``. The dataset provides a fine-grained, tree-level sentiment # ``dgl.data``. The dataset provides a fine-grained, tree-level sentiment
...@@ -48,7 +48,7 @@ Tree-LSTM in DGL ...@@ -48,7 +48,7 @@ Tree-LSTM in DGL
# their embeddings would be masked to all-zero. # their embeddings would be masked to all-zero.
# #
# .. figure:: https://i.loli.net/2018/11/08/5be3d4bfe031b.png # .. figure:: https://i.loli.net/2018/11/08/5be3d4bfe031b.png
# :alt: # :alt:
# #
# The figure displays one sample of the SST dataset, which is a # The figure displays one sample of the SST dataset, which is a
# constituency parse tree with their nodes labeled with sentiment. To # constituency parse tree with their nodes labeled with sentiment. To
...@@ -69,8 +69,8 @@ SSTBatch = namedtuple('SSTBatch', ['graph', 'mask', 'wordid', 'label']) ...@@ -69,8 +69,8 @@ SSTBatch = namedtuple('SSTBatch', ['graph', 'mask', 'wordid', 'label'])
# The non-leaf nodes have a special word PAD_WORD. The sentiment # The non-leaf nodes have a special word PAD_WORD. The sentiment
# label is stored in the "y" feature field. # label is stored in the "y" feature field.
trainset = SSTDataset(mode='tiny') # the "tiny" set has only five trees trainset = SSTDataset(mode='tiny') # the "tiny" set has only five trees
tiny_sst = trainset.trees tiny_sst = [tr for tr in trainset]
num_vocabs = trainset.num_vocabs num_vocabs = trainset.vocab_size
num_classes = trainset.num_classes num_classes = trainset.num_classes
vocab = trainset.vocab # vocabulary dict: key -> id vocab = trainset.vocab # vocabulary dict: key -> id
...@@ -108,25 +108,25 @@ plot_tree(graph.to_networkx()) ...@@ -108,25 +108,25 @@ plot_tree(graph.to_networkx())
# .. note:: # .. note::
# #
# **Definition**: :func:`~dgl.batch` unions a list of :math:`B` # **Definition**: :func:`~dgl.batch` unions a list of :math:`B`
# :class:`~dgl.DGLGraph`\ s and returns a :class:`~dgl.DGLGraph` of batch # :class:`~dgl.DGLGraph`\ s and returns a :class:`~dgl.DGLGraph` of batch
# size :math:`B`. # size :math:`B`.
# #
# - The union includes all the nodes, # - The union includes all the nodes,
# edges, and their features. The order of nodes, edges, and features are # edges, and their features. The order of nodes, edges, and features are
# preserved. # preserved.
# #
# - Given that you have :math:`V_i` nodes for graph # - Given that you have :math:`V_i` nodes for graph
# :math:`\mathcal{G}_i`, the node ID :math:`j` in graph # :math:`\mathcal{G}_i`, the node ID :math:`j` in graph
# :math:`\mathcal{G}_i` correspond to node ID # :math:`\mathcal{G}_i` correspond to node ID
# :math:`j + \sum_{k=1}^{i-1} V_k` in the batched graph. # :math:`j + \sum_{k=1}^{i-1} V_k` in the batched graph.
# #
# - Therefore, performing feature transformation and message passing on # - Therefore, performing feature transformation and message passing on
# the batched graph is equivalent to doing those # the batched graph is equivalent to doing those
# on all ``DGLGraph`` constituents in parallel. # on all ``DGLGraph`` constituents in parallel.
# #
# - Duplicate references to the same graph are # - Duplicate references to the same graph are
# treated as deep copies; the nodes, edges, and features are duplicated, # treated as deep copies; the nodes, edges, and features are duplicated,
# and mutation on one reference does not affect the other. # and mutation on one reference does not affect the other.
# - The batched graph keeps track of the meta # - The batched graph keeps track of the meta
# information of the constituents so it can be # information of the constituents so it can be
# :func:`~dgl.batched_graph.unbatch`\ ed to list of ``DGLGraph``\ s. # :func:`~dgl.batched_graph.unbatch`\ ed to list of ``DGLGraph``\ s.
...@@ -135,9 +135,9 @@ plot_tree(graph.to_networkx()) ...@@ -135,9 +135,9 @@ plot_tree(graph.to_networkx())
# ------------------------------------------------ # ------------------------------------------------
# #
# Researchers have proposed two types of Tree-LSTMs: Child-Sum # Researchers have proposed two types of Tree-LSTMs: Child-Sum
# Tree-LSTMs, and :math:`N`-ary Tree-LSTMs. In this tutorial you focus # Tree-LSTMs, and :math:`N`-ary Tree-LSTMs. In this tutorial you focus
# on applying *Binary* Tree-LSTM to binarized constituency trees. This # on applying *Binary* Tree-LSTM to binarized constituency trees. This
# application is also known as *Constituency Tree-LSTM*. Use PyTorch # application is also known as *Constituency Tree-LSTM*. Use PyTorch
# as a backend framework to set up the network. # as a backend framework to set up the network.
# #
# In `N`-ary Tree-LSTM, each unit at node :math:`j` maintains a hidden # In `N`-ary Tree-LSTM, each unit at node :math:`j` maintains a hidden
...@@ -145,7 +145,7 @@ plot_tree(graph.to_networkx()) ...@@ -145,7 +145,7 @@ plot_tree(graph.to_networkx())
# :math:`j` takes the input vector :math:`x_j` and the hidden # :math:`j` takes the input vector :math:`x_j` and the hidden
# representations of the child units: :math:`h_{jl}, 1\leq l\leq N` as # representations of the child units: :math:`h_{jl}, 1\leq l\leq N` as
# input, then update its new hidden representation :math:`h_j` and memory # input, then update its new hidden representation :math:`h_j` and memory
# cell :math:`c_j` by: # cell :math:`c_j` by:
# #
# .. math:: # .. math::
# #
...@@ -164,7 +164,7 @@ plot_tree(graph.to_networkx()) ...@@ -164,7 +164,7 @@ plot_tree(graph.to_networkx())
# ``apply_node_func``, a user specifies what to do with node features, # ``apply_node_func``, a user specifies what to do with node features,
# without considering edge features and messages. In a Tree-LSTM case, # without considering edge features and messages. In a Tree-LSTM case,
# ``apply_node_func`` is a must, since there exists (leaf) nodes with # ``apply_node_func`` is a must, since there exists (leaf) nodes with
# :math:`0` incoming edges, which would not be updated with # :math:`0` incoming edges, which would not be updated with
# ``reduce_func``. # ``reduce_func``.
# #
...@@ -337,7 +337,7 @@ weight_decay = 1e-4 ...@@ -337,7 +337,7 @@ weight_decay = 1e-4
epochs = 10 epochs = 10
# create the model # create the model
model = TreeLSTM(trainset.num_vocabs, model = TreeLSTM(trainset.vocab_size,
x_size, x_size,
h_size, h_size,
trainset.num_classes, trainset.num_classes,
...@@ -373,7 +373,7 @@ for epoch in range(epochs): ...@@ -373,7 +373,7 @@ for epoch in range(epochs):
c = th.zeros((n, h_size)) c = th.zeros((n, h_size))
logits = model(batch, h, c) logits = model(batch, h, c)
logp = F.log_softmax(logits, 1) logp = F.log_softmax(logits, 1)
loss = F.nll_loss(logp, batch.label, reduction='sum') loss = F.nll_loss(logp, batch.label, reduction='sum')
optimizer.zero_grad() optimizer.zero_grad()
loss.backward() loss.backward()
optimizer.step() optimizer.step()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment