Add more details about GPU (correspondence, targeting) (#423)

* Add GPU Windows document link * Add GPU targeting and SDK correspondence table * Add targeting link * Request default targeting for debugging

Add more details about GPU (correspondence, targeting) (#423)
* Add GPU Windows document link * Add GPU targeting and SDK correspondence table * Add targeting link * Request default targeting for debugging
08f198a8 · Laurae · Guolin Ke · 68e2530f · 08f198a8 · 08f198a8
Commit 08f198a8 authored Apr 18, 2017 by Laurae Committed by Guolin Ke Apr 18, 2017
Show whitespace changes
Inline Side-by-side

Showing with 295 additions and 36 deletions

docs/GPU-Targets.md docs/GPU-Targets.md +261 -0

docs/GPU-Tutorial.md docs/GPU-Tutorial.md +22 -34

docs/GPU-Windows.md docs/GPU-Windows.md +12 -2

No files found.
--- a/docs/GPU-Targets.md
+++ b/docs/GPU-Targets.md
+# Correspondence Table
+
+When using OpenCL SDKs, targeting CPU and GPU at the same time is sometimes possible. This is especially true for Intel OpenCL SDK and AMD APP SDK.
+
+You can find below a table of correspondence:
+
+| SDK | CPU Intel/AMD | GPU Intel | GPU AMD | GPU NVIDIA |
+| --- | :---: | :---: | :---: | :---: |
+| [Intel SDK for OpenCL](https://software.intel.com/en-us/articles/opencl-drivers) | Supported | Supported * | Supported | Untested |
+| [AMD APP SDK](http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/) | Supported | Untested * | Supported | Untested |
+| [NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) | Untested ** | Untested ** | Untested ** | Supported |
+
+Legend:
+- \* Not usable directly.
+- \** Reported as unsupported in public forums.
+
+AMD GPUs using Intel SDK for OpenCL is not a typo, nor AMD APP SDK compatibility with CPUs.
+
+---
+
+# Targeting Table
+
+We present the following scenarii:
+
+* CPU, no GPU
+* Single CPU and GPU (even with integrated graphics)
+* Multiple CPU/GPU
+
+We provide test R code below, but you can use the language of your choice with the examples of your choices:
+
+```r
+library(lightgbm)
+data(agaricus.train, package = "lightgbm")
+train <- agaricus.train
+train$data[, 1] <- 1:6513
+dtrain <- lgb.Dataset(train$data, label = train$label)
+data(agaricus.test, package = "lightgbm")
+test <- agaricus.test
+dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
+valids <- list(test = dtest)
+
+params <- list(objective = "regression",
+               metric = "rmse",
+               device = "gpu",
+               gpu_platform_id = 0,
+               gpu_device_id = 0,
+               nthread = 1,
+               boost_from_average = FALSE,
+               num_tree_per_iteration = 10,
+               max_bin = 32)
+model <- lgb.train(params,
+                   dtrain,
+                   2,
+                   valids,
+                   min_data = 1,
+                   learning_rate = 1,
+                   early_stopping_rounds = 10)
+```
+
+Using a bad `gpu_device_id` is not critical, as it will fallback to:
+
+* `gpu_device_id = 0` if using `gpu_platform_id = 0`
+* `gpu_device_id = 1` if using `gpu_platform_id = 1`
+
+However, using a bad combination of `gpu_platform_id` and `gpu_device_id` will lead to a **crash** (you will lose your entire session content). Beware of it.
+
+## CPU only architectures
+
+When you have a single device (one CPU), OpenCL usage is straightforward: `gpu_platform_id = 0`, `gpu_device_id = 0`
+
+This will use the CPU with OpenCL, even though it says it says GPU.
+
+Example:
+
+```r
+> params <- list(objective = "regression",
+                metric = "rmse",
+                device = "gpu",
+                gpu_platform_id = 0,
+                gpu_device_id = 0,
+                nthread = 1,
+                boost_from_average = FALSE,
+                num_tree_per_iteration = 10,
+                max_bin = 32)
+> model <- lgb.train(params,
+                    dtrain,
+                    2,
+                    valids,
+                    min_data = 1,
+                    learning_rate = 1,
+                    early_stopping_rounds = 10)
+[LightGBM] [Info] This is the GPU trainer!!
+[LightGBM] [Info] Total Bins 232
+[LightGBM] [Info] Number of data: 6513, number of used features: 116
+[LightGBM] [Info] Using requested OpenCL platform 0 device 1
+[LightGBM] [Info] Using GPU Device: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, Vendor: GenuineIntel
+[LightGBM] [Info] Compiling OpenCL Kernel with 16 bins...
+[LightGBM] [Info] GPU programs have been built
+[LightGBM] [Info] Size of histogram bin entry: 12
+[LightGBM] [Info] 40 dense feature groups (0.12 MB) transfered to GPU in 0.004540 secs. 76 sparse feature groups.
+[LightGBM] [Info] No further splits with positive gain, best gain: -inf
+[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=8
+[1]:	test's rmse:1.10643e-17 
+[LightGBM] [Info] No further splits with positive gain, best gain: -inf
+[LightGBM] [Info] Trained a tree with leaves=7 and max_depth=5
+[2]:	test's rmse:0
+```
+
+## Single CPU and GPU (even with integrated graphics)
+
+If you have integrated graphics card (Intel HD Graphics) and a dedicated graphics card (AMD, NVIDIA), the dedicated graphics card will automatically override the integrated graphics card. The workaround is to disable your dedicated graphics card to be able to use your integrated graphics card.
+
+When you have multiple devices (one CPU and one GPU), the order is usually the following:
+
+* GPU: `gpu_platform_id = 0`, `gpu_device_id = 0`, sometimes it is usable using `gpu_platform_id = 1`, `gpu_device_id = 1` but at your own risk!
+* CPU: `gpu_platform_id = 0`, `gpu_device_id = 1`
+
+Example of GPU (gpu_platform_id = 0`, `gpu_device_id = 0):
+
+```r
+> params <- list(objective = "regression",
+                metric = "rmse",
+                device = "gpu",
+                gpu_platform_id = 0,
+                gpu_device_id = 0,
+                nthread = 1,
+                boost_from_average = FALSE,
+                num_tree_per_iteration = 10,
+                max_bin = 32)
+> model <- lgb.train(params,
+                    dtrain,
+                    2,
+                    valids,
+                    min_data = 1,
+                    learning_rate = 1,
+                    early_stopping_rounds = 10)
+[LightGBM] [Info] This is the GPU trainer!!
+[LightGBM] [Info] Total Bins 232
+[LightGBM] [Info] Number of data: 6513, number of used features: 116
+[LightGBM] [Info] Using GPU Device: Oland, Vendor: Advanced Micro Devices, Inc.
+[LightGBM] [Info] Compiling OpenCL Kernel with 16 bins...
+[LightGBM] [Info] GPU programs have been built
+[LightGBM] [Info] Size of histogram bin entry: 12
+[LightGBM] [Info] 40 dense feature groups (0.12 MB) transfered to GPU in 0.004211 secs. 76 sparse feature groups.
+[LightGBM] [Info] No further splits with positive gain, best gain: -inf
+[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=8
+[1]:	test's rmse:1.10643e-17 
+[LightGBM] [Info] No further splits with positive gain, best gain: -inf
+[LightGBM] [Info] Trained a tree with leaves=7 and max_depth=5
+[2]:	test's rmse:0
+```
+
+Example of CPU (gpu_platform_id = 0`, `gpu_device_id = 1):
+
+```r
+> params <- list(objective = "regression",
+                metric = "rmse",
+                device = "gpu",
+                gpu_platform_id = 0,
+                gpu_device_id = 1,
+                nthread = 1,
+                boost_from_average = FALSE,
+                num_tree_per_iteration = 10,
+                max_bin = 32)
+> model <- lgb.train(params,
+                    dtrain,
+                    2,
+                    valids,
+                    min_data = 1,
+                    learning_rate = 1,
+                    early_stopping_rounds = 10)
+[LightGBM] [Info] This is the GPU trainer!!
+[LightGBM] [Info] Total Bins 232
+[LightGBM] [Info] Number of data: 6513, number of used features: 116
+[LightGBM] [Info] Using requested OpenCL platform 0 device 1
+[LightGBM] [Info] Using GPU Device: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, Vendor: GenuineIntel
+[LightGBM] [Info] Compiling OpenCL Kernel with 16 bins...
+[LightGBM] [Info] GPU programs have been built
+[LightGBM] [Info] Size of histogram bin entry: 12
+[LightGBM] [Info] 40 dense feature groups (0.12 MB) transfered to GPU in 0.004540 secs. 76 sparse feature groups.
+[LightGBM] [Info] No further splits with positive gain, best gain: -inf
+[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=8
+[1]:	test's rmse:1.10643e-17 
+[LightGBM] [Info] No further splits with positive gain, best gain: -inf
+[LightGBM] [Info] Trained a tree with leaves=7 and max_depth=5
+[2]:	test's rmse:0
+```
+
+When using a wrong `gpu_device_id`, it will automatically fallback to `gpu_device_id = 0`:
+
+```r
+> params <- list(objective = "regression",
+                metric = "rmse",
+                device = "gpu",
+                gpu_platform_id = 0,
+                gpu_device_id = 9999,
+                nthread = 1,
+                boost_from_average = FALSE,
+                num_tree_per_iteration = 10,
+                max_bin = 32)
+> model <- lgb.train(params,
+                    dtrain,
+                    2,
+                    valids,
+                    min_data = 1,
+                    learning_rate = 1,
+                    early_stopping_rounds = 10)
+[LightGBM] [Info] This is the GPU trainer!!
+[LightGBM] [Info] Total Bins 232
+[LightGBM] [Info] Number of data: 6513, number of used features: 116
+[LightGBM] [Info] Using GPU Device: Oland, Vendor: Advanced Micro Devices, Inc.
+[LightGBM] [Info] Compiling OpenCL Kernel with 16 bins...
+[LightGBM] [Info] GPU programs have been built
+[LightGBM] [Info] Size of histogram bin entry: 12
+[LightGBM] [Info] 40 dense feature groups (0.12 MB) transfered to GPU in 0.004211 secs. 76 sparse feature groups.
+[LightGBM] [Info] No further splits with positive gain, best gain: -inf
+[LightGBM] [Info] Trained a tree with leaves=16 and max_depth=8
+[1]:	test's rmse:1.10643e-17 
+[LightGBM] [Info] No further splits with positive gain, best gain: -inf
+[LightGBM] [Info] Trained a tree with leaves=7 and max_depth=5
+[2]:	test's rmse:0
+```
+
+Do not ever run under the following scenario as it is known to crash even if it says it is using the CPU because it is NOT the case:
+
+* One CPU and one GPU
+* `gpu_platform_id = 1`, `gpu_device_id = 0`
+
+```r
+> params <- list(objective = "regression",
+                metric = "rmse",
+                device = "gpu",
+                gpu_platform_id = 1,
+                gpu_device_id = 0,
+                nthread = 1,
+                boost_from_average = FALSE,
+                num_tree_per_iteration = 10,
+                max_bin = 32)
+> model <- lgb.train(params,
+                    dtrain,
+                    2,
+                    valids,
+                    min_data = 1,
+                    learning_rate = 1,
+                    early_stopping_rounds = 10)
+[LightGBM] [Info] This is the GPU trainer!!
+[LightGBM] [Info] Total Bins 232
+[LightGBM] [Info] Number of data: 6513, number of used features: 116
+[LightGBM] [Info] Using requested OpenCL platform 1 device 0
+[LightGBM] [Info] Using GPU Device: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, Vendor: Intel(R) Corporation
+[LightGBM] [Info] Compiling OpenCL Kernel with 16 bins...
+terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::opencl_error> >'
+  what():  Invalid Program
+
+This application has requested the Runtime to terminate it in an unusual way.
+Please contact the application's support team for more information.
+```
+
+## Multiple CPU and GPU
+
+If you have multiple devices (multiple CPUs and multiple GPUs), you will have to test different `gpu_device_id` and different `gpu_platform_id` values to find out the values which suits the CPU/GPU you want to use. Keep in mind that using the integrated graphics card is not directly possible without disabling every dedicated graphics card.
--- a/docs/GPU-Tutorial.md
+++ b/docs/GPU-Tutorial.md
@@ -2,21 +2,18 @@ LightGBM GPU Tutorial
 ==================================

 The purpose of this document is to give you a quick step-by-step tutorial on GPU training.
-We will use the GPU instance on
-[Microsoft Azure cloud computing platform](https://azure.microsoft.com/)
-for demonstration, but you can use any machine with modern AMD or NVIDIA GPUs.
+
+For Windows, please see [GPU Windows Tutorial](./GPU-Windows.md).
+
+We will use the GPU instance on [Microsoft Azure cloud computing platform](https://azure.microsoft.com/) for demonstration, but you can use any machine with modern AMD or NVIDIA GPUs.


 GPU Setup
 -------------------------

-You need to launch a `NV` type instance on Azure (available in East US, North
-Central US, South Central US, West Europe and Southeast Asia zones)
-and select Ubuntu 16.04 LTS as the operating system.
-For testing, the smallest `NV6` type virtual machine is sufficient, which includes
-1/2 M60 GPU, with 8 GB memory, 180 GB/s memory bandwidth and 4,825 GFLOPS peak
-computation power. Don't use the `NC` type instance as the GPUs (K80) are
-based on an older architecture (Kepler).
+You need to launch a `NV` type instance on Azure (available in East US, North Central US, South Central US, West Europe and Southeast Asia zones) and select Ubuntu 16.04 LTS as the operating system.
+
+For testing, the smallest `NV6` type virtual machine is sufficient, which includes 1/2 M60 GPU, with 8 GB memory, 180 GB/s memory bandwidth and 4,825 GFLOPS peak computation power. Don't use the `NC` type instance as the GPUs (K80) are based on an older architecture (Kepler).

 First we need to install minimal NVIDIA drivers and OpenCL development environment:

@@ -34,9 +31,7 @@ sudo init 6

 After about 30 seconds, the server should be up again.

-If you are using a AMD GPU, you should download and install the
-[AMDGPU-Pro](http://support.amd.com/en-us/download/linux) driver and 
-also install package `ocl-icd-libopencl1` and `ocl-icd-opencl-dev`.
+If you are using a AMD GPU, you should download and install the [AMDGPU-Pro](http://support.amd.com/en-us/download/linux) driver and also install package `ocl-icd-libopencl1` and `ocl-icd-opencl-dev`.

 Build LightGBM
 ----------------------------
@@ -46,8 +41,7 @@ Now install necessary building tools and dependencies:
 sudo apt-get install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev
 ```

-The NV6 GPU instance has a 320 GB ultra-fast SSD mounted at /mnt. Let's use it 
-as our workspace (skip this if you are using your own machine):
+The NV6 GPU instance has a 320 GB ultra-fast SSD mounted at /mnt. Let's use it as our workspace (skip this if you are using your own machine):

 ```
 sudo mkdir -p /mnt/workspace
@@ -68,15 +62,12 @@ cd ..

 You will see two binaries are generated, `lightgbm` and `lib_lightgbm.so`.

-If you are building on OSX, you probably need to remove macro
-`BOOST_COMPUTE_USE_OFFLINE_CACHE` in `src/treelearner/gpu_tree_learner.h` to
-avoid a known crash bug in Boost.Compute.
+If you are building on OSX, you probably need to remove macro `BOOST_COMPUTE_USE_OFFLINE_CACHE` in `src/treelearner/gpu_tree_learner.h` to avoid a known crash bug in Boost.Compute.

 Install Python Interface (optional)
 -----------------------------------

-If you want to use the Python interface of LightGBM, you can install it now 
-(along with some necessary Python package dependencies):
+If you want to use the Python interface of LightGBM, you can install it now (along with some necessary Python package dependencies):

 ```
 sudo apt-get -y install python-pip
@@ -86,10 +77,9 @@ sudo python setup.py install
 cd ..
 ```

-You need to set an additional parameter `"device" : "gpu"` (along with your other options
-like `learning_rate`, `num_leaves`, etc) to use GPU in Python.
-You can read our [Python Guide](https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide)
-for more information on how to use the Python interface.
+You need to set an additional parameter `"device" : "gpu"` (along with your other options like `learning_rate`, `num_leaves`, etc) to use GPU in Python.
+
+You can read our [Python Guide](https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide) for more information on how to use the Python interface.

 Dataset Preparation
 ----------------------------
@@ -107,8 +97,7 @@ ln -s boosting_tree_benchmarks/data/higgs.train
 ln -s boosting_tree_benchmarks/data/higgs.test
 ```

-Now we create a configuration file for LightGBM by running the following commands
-(please copy the entire block and run it as a whole):
+Now we create a configuration file for LightGBM by running the following commands (please copy the entire block and run it as a whole):

 ```
 cat > lightgbm_gpu.conf <<EOF
@@ -130,16 +119,12 @@ EOF
 echo "num_threads=$(nproc)" >> lightgbm_gpu.conf
 ```

-GPU is enabled in the configuration file we just created by setting `device=gpu`.  It will use
-the first GPU installed on the system by default (`gpu_platform_id=0` and
-`gpu_device_id=0`).
+GPU is enabled in the configuration file we just created by setting `device=gpu`. It will use the first GPU installed on the system by default (`gpu_platform_id=0` and `gpu_device_id=0`).

 Run Your First Learning Task on GPU
 -----------------------------------

-Now we are ready to start GPU training! First we want to verify the GPU works
-correctly. Run the following command to train on GPU, and take a note of the
-AUC after 50 iterations:
+Now we are ready to start GPU training! First we want to verify the GPU works correctly. Run the following command to train on GPU, and take a note of the AUC after 50 iterations:

 ```
 ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc
@@ -165,8 +150,7 @@ Speed test on CPU:

 You should observe over three times speedup on this GPU.

-The GPU acceleration can be used on other tasks/metrics (regression, multi-class classification, ranking, etc) 
-as well. For example, we can train the Higgs dataset on GPU as a regression task:
+The GPU acceleration can be used on other tasks/metrics (regression, multi-class classification, ranking, etc) as well. For example, we can train the Higgs dataset on GPU as a regression task:

 ```
 ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=regression_l2 metric=l2
@@ -183,3 +167,7 @@ Further Reading

 [GPU Tuning Guide and Performance Comparison](./GPU-Performance.md)

+[GPU SDK Correspondence and Device Targeting Table](./GPU-Targets.md).
+
+[GPU Windows Tutorial](./GPU-Windows.md)
+
--- a/docs/GPU-Windows.md
+++ b/docs/GPU-Windows.md
@@ -50,9 +50,11 @@ Does not apply to you if you do not use a third-party antivirus nor the default

 Installing the appropriate OpenCL SDK requires you to download the correct vendor source SDK. You need to know on what you are going to use LightGBM!:

-* For running on Intel, get Intel OpenCL SDK: https://software.intel.com/en-us/articles/opencl-drivers
+* For running on Intel, get Intel SDK for OpenCL: https://software.intel.com/en-us/articles/opencl-drivers
 * For running on AMD, get AMD APP SDK: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/
-* For running on NVIDIA, get CUDA: https://developer.nvidia.com/cuda-downloads
+* For running on NVIDIA, get CUDA Toolkit: https://developer.nvidia.com/cuda-downloads
+
+Further reading and correspondnce table (especially if you intend to use cross-platform devices, like Intel CPU with AMD APP SDK): [GPU SDK Correspondence and Device Targeting Table](./GPU-Targets.md).

 ---

@@ -255,6 +257,8 @@ cd C:/github_repos/LightGBM/examples/binary_classification

 Congratulations for reaching this stage!

+To learn how to target a correct CPU or GPU for training, please see: [GPU SDK Correspondence and Device Targeting Table](./GPU-Targets.md).
+
 ---

 ## LightGBM Setup and Installation for Python (Python: final step)
@@ -318,6 +322,8 @@ gbm = lgb.train(params,

 Congratulations for reaching this stage!

+To learn how to target a correct CPU or GPU for training, please see: [GPU SDK Correspondence and Device Targeting Table](./GPU-Targets.md).
+
 ---

 ## LightGBM Setup and Installation for R (R: final step)
@@ -501,12 +507,16 @@ model <- lgb.train(params,

 Congratulations for reaching this stage!

+To learn how to target a correct CPU or GPU for training, please see: [GPU SDK Correspondence and Device Targeting Table](./GPU-Targets.md).
+
 ## Debugging LightGBM crashes in CLI

 Now that you compiled LightGBM, you try it... and you always see a segmentation fault or an undocumented crash with GPU support:

 ![Segmentation Fault](https://cloud.githubusercontent.com/assets/9083669/25015529/7326860a-207c-11e7-8fc3-320b2be619a6.png)

+Please check you are using the right device and whether it works with the default `gpu_device_id = 0` and `gpu_platform_id = 0`. If it still does not work with the default values, then you should follow all the steps below.
+
 You will have to redo the compilation steps for LightGBM to add debugging mode. This involves:

 * Deleting `C:/github_repos/LightGBM/build` folder