[KG][Model] Knowledge graph embeddings (#888)

* upd * fig edgebatch edges * add test * trigger * Update README.md for pytorch PinSage example. Add noting that the PinSage model example under example/pytorch/recommendation only work with Python 3.6+ as its dataset loader depends on stanfordnlp package which work only with Python 3.6+. * Provid a frame agnostic API to test nn modules on both CPU and CUDA side. 1. make dgl.nn.xxx frame agnostic 2. make test.backend include dgl.nn modules 3. modify test_edge_softmax of test/mxnet/test_nn.py and test/pytorch/test_nn.py work on both CPU and GPU * Fix style * Delete unused code * Make agnostic test only related to tests/backend 1. clear all agnostic related code in dgl.nn 2. make test_graph_conv agnostic to cpu/gpu * Fix code style * fix * doc * Make all test code under tests.mxnet/pytorch.test_nn.py work on both CPU and GPU. * Fix syntex * Remove rand * Add TAGCN nn.module and example * Now tagcn can run on CPU. * Add unitest for TGConv * Fix style * For pubmed dataset, using --lr=0.005 can achieve better acc * Fix style * Fix some descriptions * trigger * Fix doc * Add nn.TGConv and example * Fix bug * Update data in mxnet.tagcn test acc. * Fix some comments and code * delete useless code * Fix namming * Fix bug * Fix bug * Add test for mxnet TAGCov * Add test code for mxnet TAGCov * Update some docs * Fix some code * Update docs dgl.nn.mxnet * Update weight init * Fix * init version. * change default value of regularization. * avoid specifying adversarial_temperature * use default eval_interval. * remove original model. * remove optimizer. * set default value of num_proc * set default value of log_interval. * don't need to set neg_sample_size_valid. * remove unused code. * use uni_weight by default. * unify model. * rename model. * remove unnecessary data sampler. * remove the code for checkpoint. * fix eval. * raise exception in invalid arguments. * remove RowAdagrad. * remove unsupported score function for now. * Fix bugs of kg Update README * Update Readme for mxnet distmult * Update README.md * Update README.md * revert changes on dmlc * add tests. * update CI. * add tests script. * reorder tests in CI. * measure performance. * add results on wn18 * remove some code. * rename the training script. * new results on TransE. * remove --train. * add format. * fix. * use EdgeSubgraph. * create PBGNegEdgeSubgraph to simplify the code. * fix test * fix CI. * run nose for unit tests. * remove unused code in dataset. * change argument to save embeddings. * test training and eval scripts in CI. * check Pytorch version. * fix a minor problem in config. * fix a minor bug. * fix readme. * Update README.md * Update README.md * Update README.md

[KG][Model] Knowledge graph embeddings (#888)
* upd * fig edgebatch edges * add test * trigger * Update README.md for pytorch PinSage example. Add noting that the PinSage model example under example/pytorch/recommendation only work with Python 3.6+ as its dataset loader depends on stanfordnlp package which work only with Python 3.6+. * Provid a frame agnostic API to test nn modules on both CPU and CUDA side. 1. make dgl.nn.xxx frame agnostic 2. make test.backend include dgl.nn modules 3. modify test_edge_softmax of test/mxnet/test_nn.py and test/pytorch/test_nn.py work on both CPU and GPU * Fix style * Delete unused code * Make agnostic test only related to tests/backend 1. clear all agnostic related code in dgl.nn 2. make test_graph_conv agnostic to cpu/gpu * Fix code style * fix * doc * Make all test code under tests.mxnet/pytorch.test_nn.py work on both CPU and GPU. * Fix syntex * Remove rand * Add TAGCN nn.module and example * Now tagcn can run on CPU. * Add unitest for TGConv * Fix style * For pubmed dataset, using --lr=0.005 can achieve better acc * Fix style * Fix some descriptions * trigger * Fix doc * Add nn.TGConv and example * Fix bug * Update data in mxnet.tagcn test acc. * Fix some comments and code * delete useless code * Fix namming * Fix bug * Fix bug * Add test for mxnet TAGCov * Add test code for mxnet TAGCov * Update some docs * Fix some code * Update docs dgl.nn.mxnet * Update weight init * Fix * init version. * change default value of regularization. * avoid specifying adversarial_temperature * use default eval_interval. * remove original model. * remove optimizer. * set default value of num_proc * set default value of log_interval. * don't need to set neg_sample_size_valid. * remove unused code. * use uni_weight by default. * unify model. * rename model. * remove unnecessary data sampler. * remove the code for checkpoint. * fix eval. * raise exception in invalid arguments. * remove RowAdagrad. * remove unsupported score function for now. * Fix bugs of kg Update README * Update Readme for mxnet distmult * Update README.md * Update README.md * revert changes on dmlc * add tests. * update CI. * add tests script. * reorder tests in CI. * measure performance. * add results on wn18 * remove some code. * rename the training script. * new results on TransE. * remove --train. * add format. * fix. * use EdgeSubgraph. * create PBGNegEdgeSubgraph to simplify the code. * fix test * fix CI. * run nose for unit tests. * remove unused code in dataset. * change argument to save embeddings. * test training and eval scripts in CI. * check Pytorch version. * fix a minor problem in config. * fix a minor bug. * fix readme. * Update README.md * Update README.md * Update README.md
15b951d4 · Da Zheng · GitHub · 1c00f3a8 · 15b951d4 · 15b951d4
Unverified Commit 15b951d4 authored Oct 02, 2019 by Da Zheng Committed by GitHub Oct 02, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 66 additions and 0 deletions

python/dgl/contrib/sampling/sampler.py python/dgl/contrib/sampling/sampler.py +3 -0

tests/scripts/task_kg_test.sh tests/scripts/task_kg_test.sh +63 -0

No files found.
--- a/python/dgl/contrib/sampling/sampler.py
+++ b/python/dgl/contrib/sampling/sampler.py
@@ -582,6 +582,9 @@ class EdgeSampler(object):
            assert g.number_of_edges() == len(relations)
        self._relations = relations
+        if batch_size < 0 or neg_sample_size < 0:
+            raise Exception('Invalid arguments')
        self._return_false_neg = return_false_neg
        self._batch_size = int(batch_size)

--- a/tests/scripts/task_kg_test.sh
+++ b/tests/scripts/task_kg_test.sh
+#!/bin/bash
+KG_DIR="./apps/kg/"
+function fail {
+    echo FAIL: $@
+    exit -1
+}
+function usage {
+    echo "Usage: $0 backend device"
+}
+# check arguments
+if [ $# -ne 2 ]; then
+    usage
+    fail "Error: must specify device and bakend"
+fi
+if [ "$2" == "cpu" ]; then
+    dev=-1
+elif [ "$2" == "gpu" ]; then
+    export CUDA_VISIBLE_DEVICES=0
+    dev=0
+else
+    usage
+    fail "Unknown device $2"
+fi
+export DGLBACKEND=$1
+export DGL_LIBRARY_PATH=${PWD}/build
+export PYTHONPATH=${PWD}/python:$KG_DIR:$PYTHONPATH
+export DGL_DOWNLOAD_DIR=${PWD}
+# test
+pushd $KG_DIR> /dev/null
+python3 -m nose -v --with-xunit tests/test_score.py || "run test_score.py on $1"
+if [ "$2" == "cpu" ]; then
+    # verify CPU training
+    python3 train.py --model DistMult --dataset FB15k --batch_size 128 \
+        --neg_sample_size 16 --hidden_dim 100 --gamma 500.0 --lr 0.1 --max_step 100 \
+        --batch_size_eval 16 --valid --test -adv --eval_interval 30 --eval_percent 0.01
+elif [ "$2" == "gpu" ]; then
+    # verify GPU training
+    python3 train.py --model DistMult --dataset FB15k --batch_size 128 \
+        --neg_sample_size 16 --hidden_dim 100 --gamma 500.0 --lr 0.1 --max_step 100 \
+        --batch_size_eval 16 --gpu 0 --valid --test -adv --eval_interval 30 --eval_percent 0.01
+    # verify mixed CPU GPU training
+    python3 train.py --model DistMult --dataset FB15k --batch_size 128 \
+        --neg_sample_size 16 --hidden_dim 100 --gamma 500.0 --lr 0.1 --max_step 100 \
+        --batch_size_eval 16 --gpu 0 --valid --test -adv --mix_cpu_gpu --eval_percent 0.01 \
+        --save_emb DistMult_FB15k_emb
+    # verify saving training result
+    python3 eval.py --model_name DistMult --dataset FB15k --hidden_dim 2000 \
+        --gamma 500.0 --batch_size 16 --gpu 0 --model_path DistMult_FB15k_emb/ --eval_percent 0.01
+fi
+popd > /dev/null