Experiments.rst 12.3 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
Experiments
===========

Comparison Experiment
---------------------

For the detailed experiment scripts and output logs, please refer to this `repo`_.

Data
^^^^

We use 4 datasets to conduct our comparison experiments. Details of data are listed in the following table:

+-------------+-------------------------+------------------------------------------------------------------------+-------------------+----------------+---------------------------------------------+
| **Data**    | **Task**                | **Link**                                                               | **#Train\_Set**   | **#Feature**   | **Comments**                                |
+=============+=========================+========================================================================+===================+================+=============================================+
| Higgs       | Binary classification   | `link <https://archive.ics.uci.edu/ml/datasets/HIGGS>`__               | 10,500,000        | 28             | use last 500,000 samples as test set        |
+-------------+-------------------------+------------------------------------------------------------------------+-------------------+----------------+---------------------------------------------+
| Yahoo LTR   | Learning to rank        | `link <https://webscope.sandbox.yahoo.com/catalog.php?datatype=c>`__   | 473,134           | 700            | set1.train as train, set1.test as test      |
+-------------+-------------------------+------------------------------------------------------------------------+-------------------+----------------+---------------------------------------------+
| MS LTR      | Learning to rank        | `link <http://research.microsoft.com/en-us/projects/mslr/>`__          | 2,270,296         | 137            | {S1,S2,S3} as train set, {S5} as test set   |
+-------------+-------------------------+------------------------------------------------------------------------+-------------------+----------------+---------------------------------------------+
| Expo        | Binary classification   | `link <http://stat-computing.org/dataexpo/2009/>`__                    | 11,000,000        | 700            | use last 1,000,000 as test set              |
+-------------+-------------------------+------------------------------------------------------------------------+-------------------+----------------+---------------------------------------------+
| Allstate    | Binary classification   | `link <https://www.kaggle.com/c/ClaimPredictionChallenge>`__           | 13,184,290        | 4228           | use last 1,000,000 as test set              |
+-------------+-------------------------+------------------------------------------------------------------------+-------------------+----------------+---------------------------------------------+

Environment
^^^^^^^^^^^

We use one Linux server as experiment platform, details are listed in the following table:

+--------------------+-------------------+-----------------------+
| **OS**             | **CPU**           | **Memory**            |
+====================+===================+=======================+
| Ubuntu 14.04 LTS   | 2 \* E5-2670 v3   | DDR4 2133Mhz, 256GB   |
+--------------------+-------------------+-----------------------+

Baseline
^^^^^^^^

We use `xgboost`_ as a baseline.

Both xgboost and LightGBM are built with OpenMP support.

Settings
^^^^^^^^

We set up total 3 settings for experiments, the parameters of these settings are:

1. xgboost:

   .. code::

       eta = 0.1
       max_depth = 8
       num_round = 500
       nthread = 16
       tree_method = exact
       min_child_weight = 100

2. xgboost\_hist (using histogram based algorithm):

   .. code::

       eta = 0.1
       num_round = 500
       nthread = 16
       tree_method = approx
       min_child_weight = 100
       tree_method = hist
       grow_policy = lossguide
       max_depth = 0
       max_leaves = 255

3. LightGBM:

   .. code::

       learning_rate = 0.1
       num_leaves = 255
       num_trees = 500
       num_threads = 16
       min_data_in_leaf = 0
       min_sum_hessian_in_leaf = 100

xgboost grows tree depth-wise and controls model complexity by ``max_depth``.
LightGBM uses leaf-wise algorithm instead and controls model complexity by ``num_leaves``.
So we cannot compare them in the exact same model setting. For the tradeoff, we use xgboost with ``max_depth=8``, which will have max number leaves to 255, to compare with LightGBM with ``num_leves=255``.

Other parameters are default values.

Result
^^^^^^

Speed
'''''

For speed comparison, we only run the training task, which is without any test or metric output. And we don't count the time for IO.

The following table is the comparison of time cost:

+-------------+---------------+---------------------+------------------+
| **Data**    | **xgboost**   | **xgboost\_hist**   | **LightGBM**     |
+=============+===============+=====================+==================+
| Higgs       | 3794.34 s     | 551.898 s           | **238.505513 s** |
+-------------+---------------+---------------------+------------------+
| Yahoo LTR   | 674.322 s     | 265.302 s           | **150.18644 s**  |
+-------------+---------------+---------------------+------------------+
| MS LTR      | 1251.27 s     | 385.201 s           | **215.320316 s** |
+-------------+---------------+---------------------+------------------+
| Expo        | 1607.35 s     | 588.253 s           | **138.504179 s** |
+-------------+---------------+---------------------+------------------+
| Allstate    | 2867.22 s     | 1355.71 s           | **348.084475 s** |
+-------------+---------------+---------------------+------------------+

We found LightGBM is faster than xgboost on all experiment data sets.

Accuracy
''''''''

For accuracy comparison, we use the accuracy on test data set to have a fair comparison.

+-------------+-----------------+---------------+---------------------+----------------+
| **Data**    | **Metric**      | **xgboost**   | **xgboost\_hist**   | **LightGBM**   |
+=============+=================+===============+=====================+================+
| Higgs       | AUC             | 0.839593      | 0.845605            | 0.845154       |
+-------------+-----------------+---------------+---------------------+----------------+
| Yahoo LTR   | NDCG\ :sub:`1`  | 0.719748      | 0.720223            | 0.732466       |
|             +-----------------+---------------+---------------------+----------------+
|             | NDCG\ :sub:`3`  | 0.717813      | 0.721519            | 0.738048       |
|             +-----------------+---------------+---------------------+----------------+
|             | NDCG\ :sub:`5`  | 0.737849      | 0.739904            | 0.756548       |
|             +-----------------+---------------+---------------------+----------------+
|             | NDCG\ :sub:`10` | 0.78089       | 0.783013            | 0.796818       |
+-------------+-----------------+---------------+---------------------+----------------+
| MS LTR      | NDCG\ :sub:`1`  | 0.483956      | 0.488649            | 0.524255       |
|             +-----------------+---------------+---------------------+----------------+
|             | NDCG\ :sub:`3`  | 0.467951      | 0.473184            | 0.505327       |
|             +-----------------+---------------+---------------------+----------------+
|             | NDCG\ :sub:`5`  | 0.472476      | 0.477438            | 0.510007       |
|             +-----------------+---------------+---------------------+----------------+
|             | NDCG\ :sub:`10` | 0.492429      | 0.496967            | 0.527371       |
+-------------+-----------------+---------------+---------------------+----------------+
| Expo        | AUC             | 0.756713      | 0.777777            | 0.777543       |
+-------------+-----------------+---------------+---------------------+----------------+
| Allstate    | AUC             | 0.607201      | 0.609042            | 0.609167       |
+-------------+-----------------+---------------+---------------------+----------------+

Memory Consumption
''''''''''''''''''

We monitor RES while running training task. And we set ``two_round=true`` (will increase data-loading time, but reduce peak memory usage, not affect training speed or accuracy) in LightGBM to reduce peak memory usage.

+-------------+---------------+---------------------+----------------+
| **Data**    | **xgboost**   | **xgboost\_hist**   | **LightGBM**   |
+=============+===============+=====================+================+
| Higgs       | 4.853GB       | 3.784GB             | **0.868GB**    |
+-------------+---------------+---------------------+----------------+
| Yahoo LTR   | 1.907GB       | 1.468GB             | **0.831GB**    |
+-------------+---------------+---------------------+----------------+
| MS LTR      | 5.469GB       | 3.654GB             | **0.886GB**    |
+-------------+---------------+---------------------+----------------+
| Expo        | 1.553GB       | 1.393GB             | **0.543GB**    |
+-------------+---------------+---------------------+----------------+
| Allstate    | 6.237GB       | 4.990GB             | **1.027GB**    |
+-------------+---------------+---------------------+----------------+

Parallel Experiment
-------------------

Data
^^^^

We use a terabyte click log dataset to conduct parallel experiments. Details are listed in following table:

+------------+-------------------------+------------+-----------------+----------------+
| **Data**   | **Task**                | **Link**   | **#Data**       | **#Feature**   |
+============+=========================+============+=================+================+
| Criteo     | Binary classification   | `link`_    | 1,700,000,000   | 67             |
+------------+-------------------------+------------+-----------------+----------------+

This data contains 13 integer features and 26 category features of 24 days click log.
We statistic the CTR and count for these 26 category features from the first ten days,
then use next ten days' data, which had been replaced the category features by the corresponding CTR and count, as training data.
Darío Hereñú's avatar
Darío Hereñú committed
186
The processed training data have a total of 1.7 billions records and 67 features.
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243

Environment
^^^^^^^^^^^

We use 16 Windows servers as experiment platform, details are listed in following table:

+----------------------+-----------------+----------------------+-------------------------------+
| **OS**               | **CPU**         | **Memory**           | **Network Adapter**           |
+======================+=================+======================+===============================+
| Windows Server 2012  | 2 * E5-2670 v2  | DDR3 1600Mhz, 256GB  | Mellanox ConnectX-3, 54Gbps,  |
|                      |                 |                      | RDMA support                  |
+----------------------+-----------------+----------------------+-------------------------------+

Settings
^^^^^^^^

.. code::

    learning_rate = 0.1
    num_leaves = 255
    num_trees = 100
    num_thread = 16
    tree_learner = data

We use data parallel here, since this data is large in ``#data`` but small in ``#feature``.

Other parameters are default values.

Result
^^^^^^

+----------------+---------------------+---------------------------------+
| **#Machine**   | **Time per Tree**   | **Memory Usage(per Machine)**   |
+================+=====================+=================================+
| 1              | 627.8 s             | 176GB                           |
+----------------+---------------------+---------------------------------+
| 2              | 311 s               | 87GB                            |
+----------------+---------------------+---------------------------------+
| 4              | 156 s               | 43GB                            |
+----------------+---------------------+---------------------------------+
| 8              | 80 s                | 22GB                            |
+----------------+---------------------+---------------------------------+
| 16             | 42 s                | 11GB                            |
+----------------+---------------------+---------------------------------+

From the results, we find that LightGBM performs linear speed up in parallel learning.

GPU Experiments
---------------

Refer to `GPU Performance <./GPU-Performance.rst>`__.

.. _repo: https://github.com/guolinke/boosting_tree_benchmarks

.. _xgboost: https://github.com/dmlc/xgboost

.. _link: http://labs.criteo.com/2013/12/download-terabyte-click-logs/