README.md 12.1 KB
Newer Older
1
2
# DGL - Knowledge Graph Embedding

Da Zheng's avatar
Da Zheng committed
3
4
**Note: DGL-KE is moved to [here](https://github.com/awslabs/dgl-ke). DGL-KE in this folder is deprecated.**

5
6
7

## Introduction

Da Zheng's avatar
Da Zheng committed
8
DGL-KE is a DGL-based package for computing node embeddings and relation embeddings of
9
10
11
12
13
14
knowledge graphs efficiently. This package is adapted from
[KnowledgeGraphEmbedding](https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding).
We enable fast and scalable training of knowledge graph embedding,
while still keeping the package as extensible as
[KnowledgeGraphEmbedding](https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding).
On a single machine,
Da Zheng's avatar
Da Zheng committed
15
16
17
18
it takes only a few minutes for medium-size knowledge graphs, such as FB15k and wn18, and
takes a couple of hours on Freebase, which has hundreds of millions of edges.

DGL-KE includes the following knowledge graph embedding models:
19
 
20
- TransE (TransE_l1 with L1 distance and TransE_l2 with L2 distance)
21
22
- DistMult
- ComplEx
23
- RESCAL
24
- TransR
25
- RotatE
26

Da Zheng's avatar
Da Zheng committed
27
It will add other popular models in the future.
28
29
30

DGL-KE supports multiple training modes:

Da Zheng's avatar
Da Zheng committed
31
32
33
34
35
36
37
38
- CPU training
- GPU training
- Joint CPU & GPU training
- Multiprocessing training on CPUs

For joint CPU & GPU training, node embeddings are stored on CPU and mini-batches are trained on GPU. This is designed for training KGE models on large knowledge graphs

For multiprocessing training, each process train mini-batches independently and use shared memory for communication between processes. This is designed to train KGE models on large knowledge graphs with many CPU cores.
39
40
41
42
43
44

We will support multi-GPU training and distributed training in a near future.

## Requirements

The package can run with both Pytorch and MXNet. For Pytorch, it works with Pytorch v1.2 or newer.
Da Zheng's avatar
Da Zheng committed
45
For MXNet, it works with MXNet 1.5 or newer.
46

47
## Built-in Datasets
48

49
DGL-KE provides five built-in knowledge graphs:
50

Da Zheng's avatar
Da Zheng committed
51
52
| Dataset | #nodes | #edges | #relations |
|---------|--------|--------|------------|
Jinjing Zhou's avatar
Jinjing Zhou committed
53
54
55
56
57
| [FB15k](https://data.dgl.ai/dataset/FB15k.zip) | 14951 | 592213 | 1345 |
| [FB15k-237](https://data.dgl.ai/dataset/FB15k-237.zip) | 14541 | 310116 | 237 |
| [wn18](https://data.dgl.ai/dataset/wn18.zip) | 40943 | 151442 | 18 |
| [wn18rr](https://data.dgl.ai/dataset/wn18rr.zip) | 40943 | 93003 | 11 |
| [Freebase](https://data.dgl.ai/dataset/Freebase.zip) | 86054151 | 338586276 | 14824 |
58
59
60
61

Users can specify one of the datasets with `--dataset` in `train.py` and `eval.py`.

## Performance
62
63
The 1 GPU speed is measured with 8 CPU cores and one Nvidia V100 GPU. (AWS P3.2xlarge)
The 8 GPU speed is measured with 64 CPU cores and eight Nvidia V100 GPU. (AWS P3.16xlarge)
64

65
The speed on FB15k 1GPU
66

67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
|  Models | TransE_l1 | TransE_l2 | DistMult | ComplEx | RESCAL | TransR | RotatE |
|---------|-----------|-----------|----------|---------|--------|--------|--------|
|MAX_STEPS| 48000     | 32000     | 40000    | 100000  | 32000  | 32000  | 20000  |
|TIME     | 370s      | 270s      | 312s     | 282s    | 2095s  | 1556s  | 1861s  |

The accuracy on FB15k

|  Models   |  MR   |  MRR  | HITS@1 | HITS@3 | HITS@10 |
|-----------|-------|-------|--------|--------|---------|
| TransE_l1 | 44.18 | 0.675 | 0.551  | 0.774  | 0.861   |
| TransE_l2 | 46.71 | 0.665 | 0.551  | 0.804  | 0.846   |
| DistMult  | 61.04 | 0.725 | 0.625  | 0.837  | 0.883   |
| ComplEx   | 64.59 | 0.785 | 0.718  | 0.835  | 0.889   |
| RESCAL    | 122.3 | 0.669 | 0.598  | 0.711  | 0.793   |
| TransR    | 59.86 | 0.676 | 0.591  | 0.735  | 0.814   |
| RotatE    | 43.66 | 0.728 | 0.632  | 0.801  | 0.874   |


The speed on FB15k 8GPU
86

87
88
|  Models | TransE_l1 | TransE_l2 | DistMult | ComplEx | RESCAL | TransR | RotatE |
|---------|-----------|-----------|----------|---------|--------|--------|--------|
89
90
|MAX_STEPS| 6000      | 4000      | 5000     | 4000    | 4000   | 4000   | 2500   |
|TIME     | 88.93s    | 62.99s    | 72.74s   | 68.37s  | 245.9s | 203.9s | 126.7s |
91
92
93

The accuracy on FB15k

94
95
|  Models   |  MR   |  MRR  | HITS@1 | HITS@3 | HITS@10 |
|-----------|-------|-------|--------|--------|---------|
96
97
98
99
100
101
102
| TransE_l1 | 44.25 | 0.672 | 0.547  | 0.774  | 0.860   |
| TransE_l2 | 46.13 | 0.658 | 0.539  | 0.748  | 0.845   |
| DistMult  | 61.72 | 0.723 | 0.626  | 0.798  | 0.881   |
| ComplEx   | 65.84 | 0.754 | 0.676  | 0.813  | 0.880   |
| RESCAL    | 135.6 | 0.652 | 0.580  | 0.693  | 0.779   |
| TransR    | 65.27 | 0.676 | 0.591  | 0.736  | 0.811   |
| RotatE    | 49.59 | 0.683 | 0.581  | 0.759  | 0.848   |
103

104
In comparison, GraphVite uses 4 GPUs and takes 14 minutes. Thus, DGL-KE trains TransE on FB15k 9.5X as fast as GraphVite with 8 GPUs. More performance information on GraphVite can be found [here](https://github.com/DeepGraphLearning/graphvite).
105

106
The speed on wn18 1GPU
107

108
109
|  Models | TransE_l1 | TransE_l2 | DistMult | ComplEx | RESCAL | TransR | RotatE |
|---------|-----------|-----------|----------|---------|--------|--------|--------|
110
111
|MAX_STEPS| 32000     | 32000     | 20000    | 20000   | 20000  | 30000  | 24000  |
|TIME     | 531.5s    | 406.6s    | 284.1s   | 282.3s  | 443.6s | 766.2s | 829.4s |
112
113
114

The accuracy on wn18

115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
|  Models   |  MR   |  MRR  | HITS@1 | HITS@3 | HITS@10 |
|-----------|-------|-------|--------|--------|---------|
| TransE_l1 | 318.4 | 0.764 | 0.602  | 0.929  | 0.949   |
| TransE_l2 | 206.2 | 0.561 | 0.306  | 0.800  | 0.944   |
| DistMult  | 486.0 | 0.818 | 0.711  | 0.921  | 0.948   |
| ComplEx   | 268.6 | 0.933 | 0.916  | 0.949  | 0.961   |
| RESCAL    | 536.6 | 0.848 | 0.790  | 0.900  | 0.927   |
| TransR    | 452.4 | 0.620 | 0.461  | 0.758  | 0.856   |
| RotatE    | 487.9 | 0.944 | 0.940  | 0.947  | 0.952   |

The speed on wn18 8GPU

|  Models | TransE_l1 | TransE_l2 | DistMult | ComplEx | RESCAL | TransR | RotatE |
|---------|-----------|-----------|----------|---------|--------|--------|--------|
|MAX_STEPS| 4000      | 4000      | 2500     | 2500    | 2500   | 2500   | 3000   |
|TIME     | 119.3s    | 81.1s     | 76.0s    | 58.0s   | 594.1s | 1168s  | 139.8s |

The accuracy on wn18

|  Models   |  MR   |  MRR  | HITS@1 | HITS@3 | HITS@10 |
|-----------|-------|-------|--------|--------|---------|
| TransE_l1 | 360.3 | 0.745 | 0.562  | 0.930  | 0.951   |
| TransE_l2 | 193.8 | 0.557 | 0.301  | 0.799  | 0.942   |
| DistMult  | 499.9 | 0.807 | 0.692  | 0.917  | 0.945   |
| ComplEx   | 476.7 | 0.935 | 0.926  | 0.943  | 0.949   |
| RESCAL    | 618.8 | 0.848 | 0.791  | 0.897  | 0.927   |
| TransR    | 513.1 | 0.659 | 0.491  | 0.821  | 0.871   |
| RotatE    | 466.2 | 0.944 | 0.940  | 0.945  | 0.951   |


The speed on Freebase (8 GPU)

|  Models | TransE_l2 | DistMult | ComplEx | TransR | RotatE |
|---------|-----------|----------|---------|--------|--------|
|MAX_STEPS| 320000   | 300000   | 360000  | 300000 | 300000 |
|TIME     | 7908s     | 7425s    | 8946s   | 16816s | 12817s |

The accuracy on Freebase (it is tested when 1000 negative edges are sampled for each positive edge).

154
155
|  Models   |  MR    |  MRR  | HITS@1 | HITS@3 | HITS@10 |
|-----------|--------|-------|--------|--------|---------|
156
157
158
159
160
| TransE_l2 | 22.4   | 0.756 | 0.688  | 0.800  | 0.882   |
| DistMul   | 45.4   | 0.833 | 0.812  | 0.843  | 0.872   |
| ComplEx   | 48.0   | 0.830 | 0.812  | 0.838  | 0.864   |
| TransR    | 51.2   | 0.697 | 0.656  | 0.716  | 0.771   |
| RotatE    | 93.3   | 0.770 | 0.749  | 0.780  | 0.805   |
161

162
163
The speed on Freebase (48 CPU)
This measured with 48 CPU cores on an AWS r5dn.24xlarge
Da Zheng's avatar
Da Zheng committed
164

165
166
167
168
|  Models | TransE_l2 | DistMult | ComplEx |
|---------|-----------|----------|---------|
|MAX_STEPS| 50000     | 50000    | 50000   |
|TIME     | 7002s     | 6340s    | 8133s   |
Da Zheng's avatar
Da Zheng committed
169

170
The accuracy on Freebase (it is tested when 1000 negative edges are sampled for each positive edge).
Da Zheng's avatar
Da Zheng committed
171

172
173
174
175
176
|  Models   |  MR    |  MRR  | HITS@1 | HITS@3 | HITS@10 |
|-----------|--------|-------|--------|--------|---------|
| TransE_l2 | 30.8   | 0.814 | 0.764  | 0.848  | 0.902   |
| DistMul   | 45.1   | 0.834 | 0.815  | 0.843  | 0.871   |
| ComplEx   | 44.9   | 0.837 | 0.819  | 0.845  | 0.870   |
Da Zheng's avatar
Da Zheng committed
177
178
179

The configuration for reproducing the performance results can be found [here](https://github.com/dmlc/dgl/blob/master/apps/kg/config/best_config.sh).

180
181
## Usage

Da Zheng's avatar
Da Zheng committed
182
183
184
185
186
187
188
189
DGL-KE doesn't require installation. The package contains two scripts `train.py` and `eval.py`.

* `train.py` trains knowledge graph embeddings and outputs the trained node embeddings
and relation embeddings.

* `eval.py` reads the pre-trained node embeddings and relation embeddings and evaluate
how accurate to predict the tail node when given (head, rel, ?), and predict the head node
when given (?, rel, tail).
Da Zheng's avatar
Da Zheng committed
190
191
192

### Input formats:

193
DGL-KE supports two knowledge graph input formats for user defined dataset
194

195
196
197
198
- raw_udd_[h|r|t], raw user defined dataset. In this format, user only need to provide triples and let the dataloader generate and manipulate the id mapping. The dataloader will generate two files: entities.tsv for entity id mapping and relations.tsv for relation id mapping. The order of head, relation and tail entities are described in [h|r|t], for example, raw_udd_trh means the triples are stored in the order of tail, relation and head. It should contains three files:
  - *train* stores the triples in the training set. In format of a triple, e.g., [src_name, rel_name, dst_name] and should follow the order specified in [h|r|t]
  - *valid* stores the triples in the validation set. In format of a triple, e.g., [src_name, rel_name, dst_name] and should follow the order specified in [h|r|t]
  - *test* stores the triples in the test set. In format of a triple, e.g., [src_name, rel_name, dst_name] and should follow the order specified in [h|r|t]
199
200

Format 2:
201
202
203
204
205
206
- udd_[h|r|t], user defined dataset. In this format, user should provide the id mapping for entities and relations. The order of head, relation and tail entities are described in [h|r|t], for example, raw_udd_trh means the triples are stored in the order of tail, relation and head. It should contains five files:
  - *entities* stores the mapping between entity name and entity Id
  - *relations* stores the mapping between relation name relation Id
  - *train* stores the triples in the training set. In format of a triple, e.g., [src_id, rel_id, dst_id] and should follow the order specified in [h|r|t]
  - *valid* stores the triples in the validation set. In format of a triple, e.g., [src_id, rel_id, dst_id] and should follow the order specified in [h|r|t]
  - *test* stores the triples in the test set. In format of a triple, e.g., [src_id, rel_id, dst_id] and should follow the order specified in [h|r|t]
Da Zheng's avatar
Da Zheng committed
207

Da Zheng's avatar
Da Zheng committed
208
209
210
211
212
213
214
215
216
217
218
### Output formats:

To save the trained embeddings, users have to provide the path with `--save_emb` when running
`train.py`. The saved embeddings are stored as numpy ndarrays.

* The node embedding is saved as `XXX_YYY_entity.npy`.

* The relation embedding is saved as `XXX_YYY_relation.npy`.

`XXX` is the dataset name and `YYY` is the model name.

Da Zheng's avatar
Da Zheng committed
219
### Command line parameters
220
221
222
223
224
225

Here are some examples of using the training script.

Train KGE models with GPU.

```bash
226
227
228
python3 train.py --model DistMult --dataset FB15k --batch_size 1024 --neg_sample_size 256 \
    --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 16 --valid --test -adv \
    --gpu 0 --max_step 40000
229
230
```

231
Train KGE models with mixed multiple GPUs.
232
233

```bash
234
235
236
237
python3 train.py --model DistMult --dataset FB15k --batch_size 1024 --neg_sample_size 256 \
    --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 16 --valid --test -adv \
    --max_step 5000 --mix_cpu_gpu --num_proc 8 --gpu 0 1 2 3 4 5 6 7 --async_update \
    --soft_rel_part --force_sync_interval 1000
238
239
240
241
242
```

Train embeddings and verify it later.

```bash
243
244
245
python3 train.py --model DistMult --dataset FB15k --batch_size 1024 --neg_sample_size 256 \
    --hidden_dim 400 --gamma 143.0 --lr 0.08 --batch_size_eval 16 --valid --test -adv \
     --gpu 0 --max_step 40000 --save_emb DistMult_FB15k_emb
246

247
248
python3 eval.py --model_name DistMult --dataset FB15k --hidden_dim 400 \
    --gamma 143.0 --batch_size 16 --gpu 0 --model_path DistMult_FB15k_emb/
249
250
251
252
253

```

Train embeddings with multi-processing. This currently doesn't work in MXNet.
```bash
254
255
256
257
python3 train.py --model TransE_l2 --dataset Freebase --batch_size 1000 \
    --neg_sample_size 200 --hidden_dim 400 --gamma 10 --lr 0.1 --max_step 50000 \
    --log_interval 100 --batch_size_eval 1000 --neg_sample_size_eval 1000 --test \
   -adv --regularization_coef 1e-9 --num_thread 1 --num_proc 48
258
```