FAQ.rst 7.77 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
LightGBM FAQ
============

Contents
~~~~~~~~

-  `Critical <#critical>`__

-  `LightGBM <#lightgbm>`__

-  `R-package <#r-package>`__

-  `Python-package <#python-package>`__

--------------

Critical
~~~~~~~~

20
21
22
Please post an issue in `Microsoft/LightGBM repository <https://github.com/Microsoft/LightGBM/issues>`__ for any
LightGBM issues you encounter. For critical issues (crash, prediction error, nonsense outputs...), you may also ping a
member of the core team according the relevant area of expertise by mentioning them with the arobase (@) symbol:
23

24
25
26
27
28
29
30
-  `@guolinke <https://github.com/guolinke>`__ (C++ code / R-package / Python-package)
-  `@chivee <https://github.com/chivee>`__ (C++ code / Python-package)
-  `@Laurae2 <https://github.com/Laurae2>`__ (R-package)
-  `@wxchan <https://github.com/wxchan>`__ (Python-package)
-  `@henry0312 <https://github.com/henry0312>`__ (Python-package)
-  `@StrikerRUS <https://github.com/StrikerRUS>`__ (Python-package)
-  `@huanzhang12 <https://github.com/huanzhang12>`__ (GPU support)
31

32
Please include as much of the following information as possible when submitting a critical issue:
33

34
-  Is it reproducible on CLI (command line interface), R, and/or Python?
35
36
37
38
39
40
41
42
43

-  Is it specific to a wrapper? (R or Python?)

-  Is it specific to the compiler? (gcc versions? MinGW versions?)

-  Is it specific to your Operating System? (Windows? Linux?)

-  Are you able to reproduce this issue with a simple case?

44
-  Does the issue persist after removing all optimization flags and compiling LightGBM in debug mode?
45

46
When submitting issues, please keep in mind that this is largely a volunteer effort, and we may not be available 24/7 to provide support.
47
48
49
50
51
52
53
54

--------------

LightGBM
~~~~~~~~

-  **Question 1**: Where do I find more details about LightGBM parameters?

55
-  **Solution 1**: Take a look at `Parameters <./Parameters.rst>`__ and the `Laurae++/Parameters <https://sites.google.com/view/lauraepp/parameters>`__ website.
56
57
58

--------------

59
-  **Question 2**: On datasets with million of features, training does not start (or starts after a very long time).
60
61
62
63
64
65
66

-  **Solution 2**: Use a smaller value for ``bin_construct_sample_cnt`` and a larger value for ``min_data``.

--------------

-  **Question 3**: When running LightGBM on a large dataset, my computer runs out of RAM.

67
-  **Solution 3**: Multiple solutions: set the ``histogram_pool_size`` parameter to the MB you want to use for LightGBM (histogram\_pool\_size + dataset size = approximately RAM used),
68
69
70
71
72
73
   lower ``num_leaves`` or lower ``max_bin`` (see `Microsoft/LightGBM#562 <https://github.com/Microsoft/LightGBM/issues/562>`__).

--------------

-  **Question 4**: I am using Windows. Should I use Visual Studio or MinGW for compiling LightGBM?

74
-  **Solution 4**: Visual Studio `performs best for LightGBM <https://github.com/Microsoft/LightGBM/issues/542>`__.
75
76
77
78
79

--------------

-  **Question 5**: When using LightGBM GPU, I cannot reproduce results over several runs.

80
81
82
-  **Solution 5**: This is normal and expected behaviour, but you may try to use ``gpu_use_dp = true`` for reproducibility
   (see `Microsoft/LightGBM#560 <https://github.com/Microsoft/LightGBM/pull/560#issuecomment-304561654>`__).
   You may also use the CPU version.
83
84
85
86
87

--------------

-  **Question 6**: Bagging is not reproducible when changing the number of threads.

88
-  **Solution 6**: LightGBM bagging is multithreaded, so its output depends on the number of threads used.
89
90
91
92
93
94
   There is `no workaround currently <https://github.com/Microsoft/LightGBM/issues/632>`__.

--------------

-  **Question 7**: I tried to use Random Forest mode, and LightGBM crashes!

95
96
97
-  **Solution 7**: This is expected behaviour for arbitrary parameters. To enable Random Forest,
   you must use ``bagging_fraction`` and ``feature_fraction`` different from 1, along with a ``bagging_freq``.
   `This thread <https://github.com/Microsoft/LightGBM/issues/691>`__ includes an example.
98
99
100

--------------

101
-  **Question 8**: CPU usage is low (like 10%) in Windows when using LightGBM on very large datasets with many core systems.
102

103
-  **Solution 8**: Please use `Visual Studio <https://visualstudio.microsoft.com/downloads/>`__
104
105
106
107
   as it may be `10x faster than MinGW <https://github.com/Microsoft/LightGBM/issues/749>`__ especially for very large trees.

--------------

108
-  **Question 9**: When I'm trying to specify a categorical column with the ``categorical_feature`` parameter, I get a segmentation fault.
109

110
111
-  **Solution 9**: The column you're trying to pass via ``categorical_feature`` likely contains very large values.
   Categorical features in LightGBM are limited by int32 range, so you cannot pass values that are greater than ``Int32.MaxValue`` (2147483647) as categorical features (see `Microsoft/LightGBM#1359 <https://github.com/Microsoft/LightGBM/issues/1359>`__). You should convert them to integers ranging from zero to the number of categories first.
112
113
114

--------------

115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
R-package
~~~~~~~~~

-  **Question 1**: Any training command using LightGBM does not work after an error occurred during the training of a previous LightGBM model.

-  **Solution 1**: Run ``lgb.unloader(wipe = TRUE)`` in the R console, and recreate the LightGBM datasets (this will wipe all LightGBM-related variables).
   Due to the pointers, choosing to not wipe variables will not fix the error.
   This is a known issue: `Microsoft/LightGBM#698 <https://github.com/Microsoft/LightGBM/issues/698>`__.

--------------

-  **Question 2**: I used ``setinfo``, tried to print my ``lgb.Dataset``, and now the R console froze!

-  **Solution 2**: Avoid printing the ``lgb.Dataset`` after using ``setinfo``.
   This is a known bug: `Microsoft/LightGBM#539 <https://github.com/Microsoft/LightGBM/issues/539>`__.

--------------

Python-package
~~~~~~~~~~~~~~

-  **Question 1**: I see error messages like this when install from GitHub using ``python setup.py install``.

   ::

       error: Error: setup script specifies an absolute path:
       /Users/Microsoft/LightGBM/python-package/lightgbm/../../lib_lightgbm.so
       setup() arguments must *always* be /-separated paths relative to the setup.py directory, *never* absolute paths.

-  **Solution 1**: This error should be solved in latest version.
   If you still meet this error, try to remove ``lightgbm.egg-info`` folder in your Python-package and reinstall,
   or check `this thread on stackoverflow <http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path>`__.

--------------

-  **Question 2**: I see error messages like

   ::

       Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset

156
   but I've already constructed a dataset by some code like
157
158
159
160
161
162
163
164
165
166
167
168

   ::

       train = lightgbm.Dataset(X_train, y_train)

   or error messages like

   ::

       Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.

-  **Solution 2**: Because LightGBM constructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers,
169
170
   categorical features and feature names etc., the Dataset objects are constructed when constructing a Booster.
   If you set ``free_raw_data=True`` (default), the raw data (with Python data struct) will be freed.
171
172
   So, if you want to:

173
   -  get label (or weight/init\_score/group) before constructing a dataset, it's same as get ``self.label``
174

175
   -  set label (or weight/init\_score/group) before constructing a dataset, it's same as ``self.label=some_label_array``
176

177
178
   -  get num\_data (or num\_feature) before constructing a dataset, you can get data with ``self.data``.
      Then, if your data is ``numpy.ndarray``, use some code like ``self.data.shape``
179

180
   -  set predictor (or reference/categorical feature) after constructing a dataset,
181
      you should set ``free_raw_data=False`` or init a Dataset object with the same raw data