README.md 20 KB
Newer Older
1
2
# LightGBM R-package

Nikita Titov's avatar
Nikita Titov committed
3
4
5
6
[![CRAN Version](https://www.r-pkg.org/badges/version/lightgbm)](https://cran.r-project.org/package=lightgbm)
[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/lightgbm)](https://cran.r-project.org/package=lightgbm)
[![API Docs](https://readthedocs.org/projects/lightgbm/badge/?version=latest)](https://lightgbm.readthedocs.io/en/latest/R/reference/)

7
<img src="man/figures/logo.svg" align="right" alt="" width="175" />
Guolin Ke's avatar
Guolin Ke committed
8

9
10
11
### Contents

* [Installation](#installation)
12
13
14
15
16
    - [Installing the CRAN Package](#installing-the-cran-package)
    - [Installing from Source with CMake](#install)
    - [Installing a GPU-enabled Build](#installing-a-gpu-enabled-build)
    - [Installing Precompiled Binaries](#installing-precompiled-binaries)
    - [Installing from a Pre-compiled lib_lightgbm](#lib_lightgbm)
17
18
* [Examples](#examples)
* [Testing](#testing)
19
20
    - [Running the Tests](#running-the-tests)
    - [Code Coverage](#code-coverage)
21
* [Updating Documentation](#updating-documentation)
22
* [Preparing a CRAN Package](#preparing-a-cran-package)
23
24
* [Known Issues](#known-issues)

Guolin Ke's avatar
Guolin Ke committed
25
26
Installation
------------
Guolin Ke's avatar
Guolin Ke committed
27

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
For the easiest installation, go to ["Installing the CRAN package"](#installing-the-cran-package).

If you experience any issues with that, try ["Installing from Source with CMake"](#install). This can produce a more efficient version of the library on Windows systems with Visual Studio.

To build a GPU-enabled version of the package, follow the steps in ["Installing a GPU-enabled Build"](#installing-a-gpu-enabled-build).

If any of the above options do not work for you or do not meet your needs, please let the maintainers know by [opening an issue](https://github.com/microsoft/LightGBM/issues).

When your package installation is done, you can check quickly if your LightGBM R-package is working by running the following:

```r
library(lightgbm)
data(agaricus.train, package='lightgbm')
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
model <- lgb.cv(
    params = list(
        objective = "regression"
        , metric = "l2"
    )
    , data = dtrain
)
```

52
53
### Installing the CRAN package

54
`{lightgbm}` is [available on CRAN](https://cran.r-project.org/package=lightgbm), and can be installed with the following R code.
55
56

```r
57
install.packages("lightgbm", repos = "https://cran.r-project.org")
58
59
```

60
This is the easiest way to install `{lightgbm}`. It does not require `CMake` or `Visual Studio`, and should work well on many different operating systems and compilers.
61
62
63

Each CRAN package is also available on [LightGBM releases](https://github.com/microsoft/LightGBM/releases), with a name like `lightgbm-{VERSION}-r-cran.tar.gz`.

64
#### Custom Installation (Linux, Mac)
65

66
The steps above should work on most systems, but users with highly-customized environments might want to change how R builds packages from source.
67

68
To change the compiler used when installing the CRAN package, you can create a file `~/.R/Makevars` which overrides `CC` (`C` compiler) and `CXX` (`C++` compiler).
69

70
For example, to use `gcc` instead of `clang` on Mac, you could use something like the following:
71

72
73
74
75
76
```make
# ~/.R/Makevars
CC=gcc-8
CXX=g++-8
CXX11=g++-8
77
78
```

79
### Installing from Source with CMake <a id="install"></a>
Guolin Ke's avatar
Guolin Ke committed
80

81
You need to install git and [CMake](https://cmake.org/) first.
82

83
Note: this method is only supported on 64-bit systems. If you need to run LightGBM on 32-bit Windows (i386), follow the instructions in ["Installing the CRAN Package"](#installing-the-cran-package).
Guolin Ke's avatar
Guolin Ke committed
84
85
86

#### Windows Preparation

87
88
NOTE: Windows users may need to run with administrator rights (either R or the command prompt, depending on the way you are installing this package).

89
Installing a 64-bit version of [Rtools](https://cran.r-project.org/bin/windows/Rtools/) is mandatory.
Guolin Ke's avatar
Guolin Ke committed
90

91
After installing `Rtools` and `CMake`, be sure the following paths are added to the environment variable `PATH`. These may have been automatically added when installing other software.
92

93
94
95
96
* `Rtools`
    - If you have `Rtools` 4.0, example:
        - `C:\rtools40\mingw64\bin`
        - `C:\rtools40\usr\bin`
97
    - If you have `Rtools` 4.2+, example:
98
99
        - `C:\rtools42\x86_64-w64-mingw32.static.posix\bin`
        - `C:\rtools42\usr\bin`
100
        - **NOTE**: this is e.g. `rtools43\` for R 4.3
101
102
103
* `CMake`
    - example: `C:\Program Files\CMake\bin`
* `R`
104
    - example: `C:\Program Files\R\R-4.5.1\bin`
105
106
107

NOTE: Two `Rtools` paths are required from `Rtools` 4.0 onwards because paths and the list of included software was changed in `Rtools` 4.0.

108
NOTE: `Rtools42` and later take a very different approach to the compiler toolchain than previous releases, and how you install it changes what is required to build packages. See ["Howto: Building R 4.2 and packages on Windows"](https://cran.r-project.org/bin/windows/base/howto-R-4.2.html).
109

110
111
#### Windows Toolchain Options

112
A "toolchain" refers to the collection of software used to build the library. The R-package can be built with three different toolchains.
113

114
**Warning for Windows users**: it is recommended to use *Visual Studio* for its better multi-threading efficiency in Windows for many core systems. For very simple systems (dual core computers or worse), MinGW64 is recommended for maximum performance. If you do not know what to choose, it is recommended to use [Visual Studio](https://visualstudio.microsoft.com/downloads/), the default compiler. **Do not try using MinGW in Windows on many core systems. It may result in 10x slower results than Visual Studio.**
Laurae's avatar
Laurae committed
115

116
117
118
119
120
121
122
123
**Visual Studio (default)**

By default, the package will be built with [Visual Studio Build Tools](https://visualstudio.microsoft.com/downloads/).

**MSYS2 (R 4.x)**

If you are using R 4.x and installation fails with Visual Studio, `LightGBM` will fall back to using [MSYS2](https://www.msys2.org/). This should work with the tools already bundled in `Rtools` 4.0.

124
If you want to force `LightGBM` to use MSYS2 (for any R version), pass `--use-msys2` to the installation script.
125

126
127
```shell
Rscript build_r.R --use-msys2
128
129
```

130
131
132
133
134
135
136
137
**MinGW**

If you want to force `LightGBM` to use [MinGW](https://www.mingw-w64.org/) (for any R version), pass `--use-mingw` to the installation script.

```shell
Rscript build_r.R --use-mingw
```

James Lamb's avatar
James Lamb committed
138
#### Mac OS Preparation
Laurae's avatar
Laurae committed
139

140
You can perform installation either with **Apple Clang** or **gcc**. In case you prefer **Apple Clang**, you should install **OpenMP** (details for installation can be found in [Installation Guide](https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#apple-clang)) first. In case you prefer **gcc**, you need to install it (details for installation can be found in [Installation Guide](https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#gcc)) and set some environment variables to tell R to use `gcc` and `g++`. If you install these from Homebrew, your versions of `g++` and `gcc` are most likely in `/usr/local/bin`, as shown below.
James Lamb's avatar
James Lamb committed
141
142
143
144
145
146

```
# replace 8 with version of gcc installed on your machine
export CXX=/usr/local/bin/g++-8 CC=/usr/local/bin/gcc-8
```

147
#### Install with CMake
148

149
After following the "preparation" steps above for your operating system, build and install the R-package with the following commands:
150

Laurae's avatar
Laurae committed
151
```sh
152
git clone --recursive https://github.com/microsoft/LightGBM
James Lamb's avatar
James Lamb committed
153
154
cd LightGBM
Rscript build_r.R
Guolin Ke's avatar
Guolin Ke committed
155
```
Laurae's avatar
Laurae committed
156

157
158
The `build_r.R` script builds the package in a temporary directory called `lightgbm_r`. It will destroy and recreate that directory each time you run the script. That script supports the following command-line options:

159
160
- `--no-build-vignettes`: Skip building vignettes.
- `-j[jobs]`: Number of threads to use when compiling LightGBM. E.g., `-j4` will try to compile 4 objects at a time.
161
162
    - by default, this script uses single-thread compilation
    - for best results, set `-j` to the number of physical CPUs
163
- `--skip-install`: Build the package tarball, but do not install it.
164
- `--use-gpu`: Build a GPU-enabled version of the library.
165
166
- `--use-mingw`: Force the use of MinGW toolchain, regardless of R version.
- `--use-msys2`: Force the use of MSYS2 toolchain, regardless of R version.
Guolin Ke's avatar
Guolin Ke committed
167

168
Note: for the build with Visual Studio/VS Build Tools in Windows, you should use the Windows CMD or PowerShell.
Guolin Ke's avatar
Guolin Ke committed
169

170
### Installing a GPU-enabled Build
171

172
You will need to install Boost and OpenCL first: details for installation can be found in [Installation-Guide](https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#build-gpu-version).
173

174
175
176
177
178
After installing these other libraries, follow the steps in ["Installing from Source with CMake"](#install). When you reach the step that mentions `build_r.R`, pass the flag `--use-gpu`.

```shell
Rscript build_r.R --use-gpu
```
179

180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
You may also need or want to provide additional configuration, depending on your setup. For example, you may need to provide locations for Boost and OpenCL.

```shell
Rscript build_r.R \
    --use-gpu \
    --opencl-library=/usr/lib/x86_64-linux-gnu/libOpenCL.so \
    --boost-librarydir=/usr/lib/x86_64-linux-gnu
```

The following options correspond to the [CMake FindBoost options](https://cmake.org/cmake/help/latest/module/FindBoost.html) by the same names.

* `--boost-root`
* `--boost-dir`
* `--boost-include-dir`
* `--boost-librarydir`

The following options correspond to the [CMake FindOpenCL options](https://cmake.org/cmake/help/latest/module/FindOpenCL.html) by the same names.

* `--opencl-include-dir`
* `--opencl-library`

201
202
### Installing Precompiled Binaries

203
Precompiled binaries for Mac and Windows are prepared by CRAN a few days after each release to CRAN. They can be installed with the following R code.
204
205

```r
206
install.packages(
207
208
209
    "lightgbm"
    , type = "both"
    , repos = "https://cran.r-project.org"
210
)
211
```
212

213
214
215
216
These packages do not require compilation, so they will be faster and easier to install than packages that are built from source.

CRAN does not prepare precompiled binaries for Linux, and as of this writing neither does this project.

217
### Installing from a Pre-compiled lib_lightgbm <a id="lib_lightgbm"></a>
218

219
Previous versions of LightGBM offered the ability to first compile the C++ library (`lib_lightgbm.{dll,dylib,so}`) and then build an R-package that wraps it.
220
221
222

As of version 3.0.0, this is no longer supported. If building from source is difficult for you, please [open an issue](https://github.com/microsoft/LightGBM/issues).

Guolin Ke's avatar
Guolin Ke committed
223
Examples
224
--------
Guolin Ke's avatar
Guolin Ke committed
225

226
227
228
229
230
231
232
233
234
Please visit [demo](https://github.com/microsoft/LightGBM/tree/master/R-package/demo):

* [Basic walkthrough of wrappers](https://github.com/microsoft/LightGBM/blob/master/R-package/demo/basic_walkthrough.R)
* [Boosting from existing prediction](https://github.com/microsoft/LightGBM/blob/master/R-package/demo/boost_from_prediction.R)
* [Early Stopping](https://github.com/microsoft/LightGBM/blob/master/R-package/demo/early_stopping.R)
* [Cross Validation](https://github.com/microsoft/LightGBM/blob/master/R-package/demo/cross_validation.R)
* [Multiclass Training/Prediction](https://github.com/microsoft/LightGBM/blob/master/R-package/demo/multiclass.R)
* [Leaf (in)Stability](https://github.com/microsoft/LightGBM/blob/master/R-package/demo/leaf_stability.R)
* [Weight-Parameter Adjustment Relationship](https://github.com/microsoft/LightGBM/blob/master/R-package/demo/weight_param.R)
235

236
237
238
Testing
-------

239
The R-package's unit tests are run automatically on every commit, via integrations like [GitHub Actions](https://github.com/microsoft/LightGBM/actions). Adding new tests in `R-package/tests/testthat` is a valuable way to improve the reliability of the R-package.
240

241
242
### Running the Tests

243
While developing the R-package, run the code below to run the unit tests.
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263

```shell
sh build-cran-package.sh \
    --no-build-vignettes

R CMD INSTALL --with-keep.source lightgbm*.tar.gz
cd R-package/tests
Rscript testthat.R
```

To run the tests with more verbose logs, set environment variable `LIGHTGBM_TEST_VERBOSITY` to a valid value for parameter [`verbosity`](https://lightgbm.readthedocs.io/en/latest/Parameters.html#verbosity).

```shell
export LIGHTGBM_TEST_VERBOSITY=1
cd R-package/tests
Rscript testthat.R
```

### Code Coverage

264
265
When adding tests, you may want to use test coverage to identify untested areas and to check if the tests you've added are covering all branches of the intended code.

266
The example below shows how to generate code coverage for the R-package on a macOS or Linux setup. To adjust for your environment, refer to [the customization step described above](#custom-installation-linux-mac).
267
268
269

```shell
# Install
270
271
sh build-cran-package.sh \
    --no-build-vignettes
272
273
274

# Get coverage
Rscript -e " \
275
276
    library(covr);
    coverage <- covr::package_coverage('./lightgbm_r', type = 'tests', quiet = FALSE);
277
278
279
280
281
    print(coverage);
    covr::report(coverage, file = file.path(getwd(), 'coverage.html'), browse = TRUE);
    "
```

282
283
284
Updating Documentation
----------------------

285
The R-package uses [`{roxygen2}`](https://CRAN.R-project.org/package=roxygen2) to generate its documentation.
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
The generated `DESCRIPTION`, `NAMESPACE`, and `man/` files are checked into source control.
To regenerate those files, run the following.

```shell
Rscript \
    --vanilla \
    -e "install.packages('roxygen2', repos = 'https://cran.rstudio.com')"

sh build-cran-package.sh --no-build-vignettes
R CMD INSTALL \
  --with-keep.source \
  ./lightgbm_*.tar.gz

cd R-package
Rscript \
    --vanilla \
    -e "roxygen2::roxygenize(load = 'installed')"
```

305
306
Preparing a CRAN Package
------------------------
307

308
This section is primarily for maintainers, but may help users and contributors to understand the structure of the R-package.
309

310
Most of `LightGBM` uses `CMake` to handle tasks like setting compiler and linker flags, including header file locations, and linking to other libraries. Because CRAN packages typically do not assume the presence of `CMake`, the R-package uses an alternative method that is in the CRAN-supported toolchain for building R packages with C++ code: `Autoconf`.
311
312
313
314
315
316
317
318

For more information on this approach, see ["Writing R Extensions"](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Configure-and-cleanup).

### Build a CRAN Package

From the root of the repository, run the following.

```shell
319
git submodule update --init --recursive
320
321
322
323
324
sh build-cran-package.sh
```

This will create a file `lightgbm_${VERSION}.tar.gz`, where `VERSION` is the version of `LightGBM`.

325
326
327
328
329
That script supports the following command-line options:

- `--no-build-vignettes`: Skip building vignettes.
- `--r-executable=[path-to-executable]`: Use an alternative build of R.

330
Also, CRAN package is generated with every commit to any repo's branch and can be found in "Artifacts" section of the associated Azure Pipelines run.
331

332
333
334
335
336
337
338
339
340
341
### Standard Installation from CRAN Package

After building the package, install it with a command like the following:

```shell
R CMD install lightgbm_*.tar.gz
```

### Changing the CRAN Package

342
A lot of details are handled automatically by `R CMD build` and `R CMD install`, so it can be difficult to understand how the files in the R-package are related to each other. An extensive treatment of those details is available in ["Writing R Extensions"](https://cran.r-project.org/doc/manuals/r-release/R-exts.html).
343
344
345
346
347
348
349

This section briefly explains the key files for building a CRAN package. To update the package, edit the files relevant to your change and re-run the steps in [Build a CRAN Package](#build-a-cran-package).

**Linux or Mac**

At build time, `configure` will be run and used to create a file `Makevars`, using `Makevars.in` as a template.

350
1. Edit `configure.ac`.
351
2. Create `configure` with `autoconf`. Do not edit it by hand. This file must be generated on Ubuntu 22.04.
352

353
    If you have an Ubuntu 22.04 environment available, run the provided script from the root of the `LightGBM` repository.
354
355
356
357
358

    ```shell
    ./R-package/recreate-configure.sh
    ```

359
    If you do not have easy access to an Ubuntu 22.04 environment, the `configure` script can be generated using Docker by running the code below from the root of this repo.
360
361
362

    ```shell
    docker run \
363
        --rm \
364
        -v $(pwd):/opt/LightGBM \
365
        -w /opt/LightGBM \
366
        ubuntu:22.04 \
367
        ./R-package/recreate-configure.sh
368
369
370
371
    ```

    The version of `autoconf` used by this project is stored in `R-package/AUTOCONF_UBUNTU_VERSION`. To update that version, update that file and run the commands above. To see available versions, see https://packages.ubuntu.com/search?keywords=autoconf.

372
3. Edit `src/Makevars.in`.
373

374
375
376
377
Alternatively, GitHub Actions can re-generate this file for you. On a pull request (only on internal one, does not work for ones from forks), create a comment with this phrase:

> /gha run r-configure

378
379
380
381
**Configuring for Windows**

At build time, `configure.win` will be run and used to create a file `Makevars.win`, using `Makevars.win.in` as a template.

382
383
1. Edit `configure.win` directly.
2. Edit `src/Makevars.win.in`.
384

385
386
387
388
389
390
391
392
393
394
395
### Testing the CRAN Package

`{lightgbm}` is tested automatically on every commit, across many combinations of operating system, R version, and compiler. This section describes how to test the package locally while you are developing.

#### Windows, Mac, and Linux

```shell
sh build-cran-package.sh
R CMD check --as-cran lightgbm_*.tar.gz
```

396
#### <a id="UBSAN"></a>ASAN and UBSAN
397

398
399
400
401
402
403
404
All packages uploaded to CRAN must pass builds using `gcc` and `clang`, instrumented with two sanitizers: the Address Sanitizer (ASAN) and the Undefined Behavior Sanitizer (UBSAN).

For more background, see

* [this blog post](https://dirk.eddelbuettel.com/code/sanitizers.html)
* [top-level CRAN documentation on these checks](https://cran.r-project.org/web/checks/check_issue_kinds.html)
* [CRAN's configuration of these checks](https://www.stats.ox.ac.uk/pub/bdr/memtests/README.txt)
405
406

You can replicate these checks locally using Docker.
407
For more information on the image used for testing, see https://github.com/wch/r-debug.
408

409
In the code below, environment variable `R_CUSTOMIZATION` should be set to one of two values.
410

411
412
* `"san"` = replicates CRAN's `gcc-ASAN` and `gcc-UBSAN` checks
* `"csan"` = replicates CRAN's `clang-ASAN` and `clang-UBSAN` checks
413

414
415
416
417
418
419
420
421
422
423
424
425
```shell
docker run \
  --rm \
  -it \
  -v $(pwd):/opt/LightGBM \
  -w /opt/LightGBM \
  --env R_CUSTOMIZATION=san \
  wch1/r-debug:latest \
  /bin/bash

# install dependencies
RDscript${R_CUSTOMIZATION} \
426
  -e "install.packages(c('R6', 'data.table', 'jsonlite', 'knitr', 'markdown', 'Matrix', 'RhpcBLASctl', 'testthat'), repos = 'https://cran.r-project.org', Ncpus = parallel::detectCores())"
427
428

# install lightgbm
429
sh build-cran-package.sh --r-executable=RD${R_CUSTOMIZATION}
430
431
RD${R_CUSTOMIZATION} \
  CMD INSTALL lightgbm_*.tar.gz
432

433
# run tests
434
cd R-package/tests
435
436
437
438
439
440
rm -f ./tests.log
RDscript${R_CUSTOMIZATION} testthat.R >> tests.log 2>&1

# check that tests passed
echo "test exit code: $?"
tail -300 ./tests.log
441
442
443
444
445
446
447
448
449
450
```

#### Valgrind

All packages uploaded to CRAN must be built and tested without raising any issues from `valgrind`. `valgrind` is a profiler that can catch serious issues like memory leaks and illegal writes. For more information, see [this blog post](https://reside-ic.github.io/blog/debugging-and-fixing-crans-additional-checks-errors/).

You can replicate these checks locally using Docker. Note that instrumented versions of R built to use `valgrind` run much slower, and these tests may take as long as 20 minutes to run.

```shell
docker run \
451
    --rm \
452
    -v $(pwd):/opt/LightGBM \
453
    -w /opt/LightGBM \
454
455
456
    -it \
        wch1/r-debug

457
RDscriptvalgrind -e "install.packages(c('R6', 'data.table', 'jsonlite', 'knitr', 'markdown', 'Matrix', 'RhpcBLASctl', 'testthat'), repos = 'https://cran.rstudio.com', Ncpus = parallel::detectCores())"
458

459
460
sh build-cran-package.sh \
    --r-executable=RDvalgrind
461
462
463
464
465
466
467
468
469
470
471

RDvalgrind CMD INSTALL \
    --preclean \
    --install-tests \
        lightgbm_*.tar.gz

cd R-package/tests

RDvalgrind \
    --no-readline \
    --vanilla \
472
    -d "valgrind --tool=memcheck --leak-check=full --track-origins=yes" \
473
474
475
476
477
478
        -f testthat.R \
2>&1 \
| tee out.log \
| cat
```

479
These tests can also be triggered on any pull request by leaving a comment in a pull request:
480
481
482

> /gha run r-valgrind

483
484
485
Known Issues
------------

486
For information about known issues with the R-package, see the [R-package section of LightGBM's main FAQ page](https://lightgbm.readthedocs.io/en/latest/FAQ.html#r-package).