FAQ.rst 20.9 KB
Newer Older
1
2
3
.. role:: raw-html(raw)
    :format: html

4
LightGBM FAQ
5
############
6

7
8
9
10
.. contents:: LightGBM Frequently Asked Questions
    :depth: 1
    :local:
    :backlinks: none
11

12
------
13

14
Please post questions, feature requests, and bug reports at https://github.com/microsoft/LightGBM/issues.
15

16
17
This project is mostly maintained by volunteers, so please be patient.
If your request is time-sensitive or more than a month goes by without a response, please tag the maintainers below for help.
18

19
20
21
22
-  `@guolinke <https://github.com/guolinke>`__ **Guolin Ke**
-  `@shiyu1994 <https://github.com/shiyu1994>`__ **Yu Shi**
-  `@jameslamb <https://github.com/jameslamb>`__ **James Lamb**
-  `@jmoralez <https://github.com/jmoralez>`__ **José Morales**
23
24
25

--------------

26
27
General LightGBM Questions
==========================
28

29
30
31
.. contents::
    :local:
    :backlinks: none
32

33
34
1. Where do I find more details about LightGBM parameters?
----------------------------------------------------------
35

36
Take a look at `Parameters <./Parameters.rst>`__.
37

38
39
2. On datasets with millions of features, training does not start (or starts after a very long time).
-----------------------------------------------------------------------------------------------------
40

41
Use a smaller value for ``bin_construct_sample_cnt`` and a larger value for ``min_data``.
42

43
44
3. When running LightGBM on a large dataset, my computer runs out of RAM.
-------------------------------------------------------------------------
45

46
47
**Multiple Solutions**: set the ``histogram_pool_size`` parameter to the MB you want to use for LightGBM (histogram\_pool\_size + dataset size = approximately RAM used),
lower ``num_leaves`` or lower ``max_bin`` (see `Microsoft/LightGBM#562 <https://github.com/microsoft/LightGBM/issues/562>`__).
48

49
50
4. I am using Windows. Should I use Visual Studio or MinGW for compiling LightGBM?
----------------------------------------------------------------------------------
51

52
Visual Studio `performs best for LightGBM <https://github.com/microsoft/LightGBM/issues/542>`__.
53

54
55
5. When using LightGBM GPU, I cannot reproduce results over several runs.
-------------------------------------------------------------------------
56

57
58
59
This is normal and expected behaviour, but you may try to use ``gpu_use_dp = true`` for reproducibility
(see `Microsoft/LightGBM#560 <https://github.com/microsoft/LightGBM/pull/560#issuecomment-304561654>`__).
You may also use the CPU version.
60

61
62
6. Bagging is not reproducible when changing the number of threads.
-------------------------------------------------------------------
63

64
:raw-html:`<strike>`
65
66
LightGBM bagging is multithreaded, so its output depends on the number of threads used.
There is `no workaround currently <https://github.com/microsoft/LightGBM/issues/632>`__.
67
68
69
70
:raw-html:`</strike>`

Starting from `#2804 <https://github.com/microsoft/LightGBM/pull/2804>`__ bagging result doesn't depend on the number of threads.
So this issue should be solved in the latest version.
71

72
73
7. I tried to use Random Forest mode, and LightGBM crashes!
-----------------------------------------------------------
74

75
76
77
This is expected behaviour for arbitrary parameters. To enable Random Forest,
you must use ``bagging_fraction`` and ``feature_fraction`` different from 1, along with a ``bagging_freq``.
`This thread <https://github.com/microsoft/LightGBM/issues/691>`__ includes an example.
78

79
80
8. CPU usage is low (like 10%) in Windows when using LightGBM on very large datasets with many-core systems.
------------------------------------------------------------------------------------------------------------
81

82
83
Please use `Visual Studio <https://visualstudio.microsoft.com/downloads/>`__
as it may be `10x faster than MinGW <https://github.com/microsoft/LightGBM/issues/749>`__ especially for very large trees.
84

85
86
9. When I'm trying to specify a categorical column with the ``categorical_feature`` parameter, I get the following sequence of warnings, but there are no negative values in the column.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
87

88
.. code-block:: console
89

90
91
   [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN
   [LightGBM] [Warning] There are no meaningful features, as all feature values are constant.
92

93
94
95
96
The column you're trying to pass via ``categorical_feature`` likely contains very large values.
Categorical features in LightGBM are limited by int32 range,
so you cannot pass values that are greater than ``Int32.MaxValue`` (2147483647) as categorical features (see `Microsoft/LightGBM#1359 <https://github.com/microsoft/LightGBM/issues/1359>`__).
You should convert them to integers ranging from zero to the number of categories first.
97

98
99
10. LightGBM crashes randomly with the error like: ``Initializing libiomp5.dylib, but found libomp.dylib already initialized.``
-------------------------------------------------------------------------------------------------------------------------------
100

101
.. code-block:: console
102

103
104
   OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
   OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
105

106
107
**Possible Cause**: This error means that you have multiple OpenMP libraries installed on your machine and they conflict with each other.
(File extensions in the error message may differ depending on the operating system).
108

109
110
If you are using Python distributed by Conda, then it is highly likely that the error is caused by the ``numpy`` package from Conda which includes the ``mkl`` package which in turn conflicts with the system-wide library.
In this case you can update the ``numpy`` package in Conda or replace the Conda's OpenMP library instance with system-wide one by creating a symlink to it in Conda environment folder ``$CONDA_PREFIX/lib``.
111

112
**Solution**: Assuming you are using macOS with Homebrew, the command which overwrites OpenMP library files in the current active Conda environment with symlinks to the system-wide library ones installed by Homebrew:
113

114
.. code-block:: bash
115

116
   ln -sf `ls -d "$(brew --cellar libomp)"/*/lib`/* $CONDA_PREFIX/lib
117

118
119
120
The described above fix worked fine before the release of OpenMP 8.0.0 version.
Starting from 8.0.0 version, Homebrew formula for OpenMP includes ``-DLIBOMP_INSTALL_ALIASES=OFF`` option which leads to that the fix doesn't work anymore.
However, you can create symlinks to library aliases manually:
121

122
.. code-block:: bash
123

124
   for LIBOMP_ALIAS in libgomp.dylib libiomp5.dylib libomp.dylib; do sudo ln -sf "$(brew --cellar libomp)"/*/lib/libomp.dylib $CONDA_PREFIX/lib/$LIBOMP_ALIAS; done
125

126
Another workaround would be removing MKL optimizations from Conda's packages completely:
127

128
.. code-block:: bash
129

130
    conda install nomkl
131

132
If this is not your case, then you should find conflicting OpenMP library installations on your own and leave only one of them.
133

134
135
11. LightGBM hangs when multithreading (OpenMP) and using forking in Linux at the same time.
--------------------------------------------------------------------------------------------
Laurae's avatar
Laurae committed
136

137
138
139
140
141
Use ``nthreads=1`` to disable multithreading of LightGBM. There is a bug with OpenMP which hangs forked sessions
with multithreading activated. A more expensive solution is to use new processes instead of using fork, however,
keep in mind it is creating new processes where you have to copy memory and load libraries (example: if you want to
fork 16 times your current process, then you will require to make 16 copies of your dataset in memory)
(see `Microsoft/LightGBM#1789 <https://github.com/microsoft/LightGBM/issues/1789#issuecomment-433713383>`__).
Laurae's avatar
Laurae committed
142

143
144
An alternative, if multithreading is really necessary inside the forked sessions, would be to compile LightGBM with
Intel toolchain. Intel compilers are unaffected by this bug.
145

146
For C/C++ users, any OpenMP feature cannot be used before the fork happens. If an OpenMP feature is used before the
147
fork happens (example: using OpenMP for forking), OpenMP will hang inside the forked sessions. Use new processes instead
148
and copy memory as required by creating new processes instead of forking (or, use Intel compilers).
149

150
151
Cloud platform container services may cause LightGBM to hang, if they use Linux fork to run multiple containers on a
single instance. For example, LightGBM hangs in AWS Batch array jobs, which `use the ECS agent
152
<https://aws.amazon.com/batch/faqs>`__ to manage multiple running jobs. Setting ``nthreads=1`` mitigates the issue.
153

154
155
12. Why is early stopping not enabled by default in LightGBM?
-------------------------------------------------------------
156

157
Early stopping involves choosing a validation set, a special type of holdout which is used to evaluate the current state of the model after each iteration to see if training can stop.
158

159
160
161
162
In ``LightGBM``, `we have decided to require that users specify this set directly <./Parameters.rst#valid>`_. Many options exist for splitting training data into training, test, and validation sets.

The appropriate splitting strategy depends on the task and domain of the data, information that a modeler has but which ``LightGBM`` as a general-purpose tool does not.

163
164
165
166
167
13. Does LightGBM support direct loading data from zero-based or one-based LibSVM format file?
----------------------------------------------------------------------------------------------

LightGBM supports loading data from zero-based LibSVM format file directly.

168
14. Why CMake cannot find the compiler when compiling LightGBM with MinGW?
169
170
171
172
173
174
175
--------------------------------------------------------------------------

.. code-block:: bash

    CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
    CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage

176
This is a known issue of CMake when using MinGW. The easiest solution is to run again your ``cmake`` command to bypass the one time stopper from CMake. Or you can upgrade your version of CMake to at least version 3.17.0.
177

178
See `Microsoft/LightGBM#3060 <https://github.com/microsoft/LightGBM/issues/3060#issuecomment-626338538>`__ for more details.
179

180
181
182
183
184
15. Where can I find LightGBM's logo to use it in my presentation?
------------------------------------------------------------------

You can find LightGBM's logo in different file formats and resolutions `here <https://github.com/microsoft/LightGBM/tree/master/docs/logo>`__.

185
186
187
188
189
190
191
192
193
16. LightGBM crashes randomly or operating system hangs during or after running LightGBM.
-----------------------------------------------------------------------------------------

**Possible Cause**: This behavior may indicate that you have multiple OpenMP libraries installed on your machine and they conflict with each other, similarly to the ``FAQ #10``.

If you are using any Python package that depends on ``threadpoolctl``, you also may see the following warning in your logs in this case:

.. code-block:: console

194
    /root/miniconda/envs/test-env/lib/python3.8/site-packages/threadpoolctl.py:546: RuntimeWarning:
195
196
197
198
199
200
201
202
203
204
    Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
    the same time. Both libraries are known to be incompatible and this
    can cause random crashes or deadlocks on Linux when loaded in the
    same Python program.
    Using threadpoolctl may cause crashes or deadlocks. For more
    information and possible workarounds, please see
        https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md

Detailed description of conflicts between multiple OpenMP instances is provided in the `following document <https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md>`__.

205
**Solution**: Assuming you are using LightGBM Python-package and conda as a package manager, we strongly recommend using ``conda-forge`` channel as the only source of all your Python package installations because it contains built-in patches to workaround OpenMP conflicts. Some other workarounds are listed `here <https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md>`__ under the "Workarounds for Intel OpenMP and LLVM OpenMP case" section.
206
207
208

If this is not your case, then you should find conflicting OpenMP library installations on your own and leave only one of them.

209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
17. Loading LightGBM fails like: ``cannot allocate memory in static TLS block``
-------------------------------------------------------------------------------

When loading LightGBM, you may encounter errors like the following.

.. code-block:: console

   lib/libgomp.so.1: cannot allocate memory in static TLS block

This most commonly happens on aarch64 Linux systems.

``gcc``'s OpenMP library (``libgomp.so``) tries to allocate a small amount of static thread-local storage ("TLS")
when it's dynamically loaded.

That error can happen when the loader isn't able to find a large enough block of memory.

On aarch64 Linux, processes and loaded libraries share the same pool of static TLS,
which makes such failures more likely. See these discussions:

* https://bugzilla.redhat.com/show_bug.cgi?id=1722181#c6
* https://gcc.gcc.gnu.narkive.com/vOXMQqLA/failure-to-dlopen-libgomp-due-to-static-tls-data

If you are experiencing this issue when using the ``lightgbm`` Python package, try upgrading
to at least ``v4.6.0``.

For older versions of the Python package, or for other LightGBM APIs, this issue can
often be avoided by loading ``libgomp.so.1``. That can be done directly by setting environment
variable ``LD_PRELOAD``, like this:

.. code-block:: console

    export LD_PRELOAD=/root/miniconda3/envs/test-env/lib/libgomp.so.1

It can also be done indirectly by changing the order that other libraries are loaded
into processes, which varies by programming language and application type.

For more details, see these discussions:

* https://github.com/microsoft/LightGBM/pull/6654#issuecomment-2352014275
* https://github.com/microsoft/LightGBM/issues/6509
* https://maskray.me/blog/2021-02-14-all-about-thread-local-storage
* https://bugzilla.redhat.com/show_bug.cgi?id=1722181#c6

252
------
253

254
R-package
255
=========
256

257
258
259
.. contents::
    :local:
    :backlinks: none
260

261
262
1. Any training command using LightGBM does not work after an error occurred during the training of a previous LightGBM model.
------------------------------------------------------------------------------------------------------------------------------
263

264
265
266
In older versions of the R package (prior to ``v3.3.0``), this could happen occasionally and the solution was to run ``lgb.unloader(wipe = TRUE)`` to remove all LightGBM-related objects. Some conversation about this could be found in `Microsoft/LightGBM#698 <https://github.com/microsoft/LightGBM/issues/698>`__.

That is no longer necessary as of ``v3.3.0``, and function ``lgb.unloader()`` has since been removed from the R package.
267

268
269
2. I used ``setinfo()``, tried to print my ``lgb.Dataset``, and now the R console froze!
----------------------------------------------------------------------------------------
270

271
272
273
274
275
As of at least LightGBM v3.3.0, this issue has been resolved and printing a ``Dataset`` object does not cause the console to freeze.

In older versions, avoid printing the ``Dataset`` after calling ``setinfo()``.

As of LightGBM v4.0.0, ``setinfo()`` has been replaced by a new method, ``set_field()``.
276

Nikita Titov's avatar
Nikita Titov committed
277
278
3. ``error in data.table::data.table()...argument 2 is NULL``.
--------------------------------------------------------------
279

280
If you are experiencing this error when running ``lightgbm``, you may be facing the same issue reported in `#2715 <https://github.com/microsoft/LightGBM/issues/2715>`_ and later in `#2989 <https://github.com/microsoft/LightGBM/pull/2989#issuecomment-614374151>`_. We have seen that in some situations, using ``data.table`` 1.11.x results in this error. To get around this, you can upgrade your version of ``data.table`` to at least version 1.12.0.
281

Nikita Titov's avatar
Nikita Titov committed
282
283
4. ``package/dependency ‘Matrix’ is not available ...``
-------------------------------------------------------
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298

In April 2024, ``Matrix==1.7-0`` was published to CRAN.
That version had a floor of ``R (>=4.4.0)``.
``{Matrix}`` is a hard runtime dependency of ``{lightgbm}``, so on any version of R older than ``4.4.0``, running ``install.packages("lightgbm")`` results in something like the following.

.. code-block:: text

    package ‘Matrix’ is not available for this version of R

To fix that without upgrading to R 4.4.0 or greater, manually install an older version of ``{Matrix}``.

.. code-block:: R

    install.packages('https://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_1.6-5.tar.gz', repos = NULL)

299
------
300
301

Python-package
302
==============
303

304
305
306
.. contents::
    :local:
    :backlinks: none
307

308
309
1. ``Error: setup script specifies an absolute path`` when installing from GitHub using ``python setup.py install``.
--------------------------------------------------------------------------------------------------------------------
310

311
312
313
314
.. note::
    As of v4.0.0, ``lightgbm`` does not support directly invoking ``setup.py``.
    This answer refers only to versions of ``lightgbm`` prior to v4.0.0.

315
.. code-block:: console
316

317
318
319
   error: Error: setup script specifies an absolute path:
   /Users/Microsoft/LightGBM/python-package/lightgbm/../../lib_lightgbm.so
   setup() arguments must *always* be /-separated paths relative to the setup.py directory, *never* absolute paths.
320

321
322
This error should be solved in latest version.
If you still meet this error, try to remove ``lightgbm.egg-info`` folder in your Python-package and reinstall,
James Lamb's avatar
James Lamb committed
323
or check `this thread on stackoverflow <https://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path>`__.
324

325
326
2. Error messages: ``Cannot ... before construct dataset``.
-----------------------------------------------------------
327

328
I see error messages like...
329

330
.. code-block:: console
331

332
   Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset
333

334
but I've already constructed a dataset by some code like:
335

336
.. code-block:: python
337

338
    train = lightgbm.Dataset(X_train, y_train)
339

340
or error messages like
341

342
.. code-block:: console
343

344
    Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.
345

346
347
348
349
**Solution**: Because LightGBM constructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers,
categorical features and feature names etc., the Dataset objects are constructed when constructing a Booster.
If you set ``free_raw_data=True`` (default), the raw data (with Python data struct) will be freed.
So, if you want to:
350

351
-  get label (or weight/init\_score/group/data) before constructing a dataset, it's same as get ``self.label``;
352

353
-  set label (or weight/init\_score/group) before constructing a dataset, it's same as ``self.label=some_label_array``;
354

355
356
357
358
359
360
361
362
-  get num\_data (or num\_feature) before constructing a dataset, you can get data with ``self.data``.
   Then, if your data is ``numpy.ndarray``, use some code like ``self.data.shape``. But do not do this after subsetting the Dataset, because you'll get always ``None``;

-  set predictor (or reference/categorical feature) after constructing a dataset,
   you should set ``free_raw_data=False`` or init a Dataset object with the same raw data.

3. I encounter segmentation faults (segfaults) randomly after installing LightGBM from PyPI using ``pip install lightgbm``.
---------------------------------------------------------------------------------------------------------------------------
363

364
365
We are doing our best to provide universal wheels which have high running speed and are compatible with any hardware, OS, compiler, etc. at the same time.
However, sometimes it's just impossible to guarantee the possibility of usage of LightGBM in any specific environment (see `Microsoft/LightGBM#1743 <https://github.com/microsoft/LightGBM/issues/1743>`__).
366

367
Therefore, the first thing you should try in case of segfaults is **compiling from the source** using ``pip install --no-binary lightgbm lightgbm``.
368
For the OS-specific prerequisites see https://github.com/microsoft/LightGBM/blob/master/python-package/README.rst.
369

370
Also, feel free to post a new issue in our GitHub repository. We always look at each case individually and try to find a root cause.
371
372
373
374

4. I would like to install LightGBM from conda. What channel should I choose?
-----------------------------------------------------------------------------

375
376
377
378
379
We strongly recommend installation from the ``conda-forge`` channel and not from the ``default`` one.

For some specific examples, see `this comment <https://github.com/microsoft/LightGBM/issues/4948#issuecomment-1013766397>`__.

In addition, as of ``lightgbm==4.4.0``, the ``conda-forge`` package automatically supports CUDA-based GPU acceleration.