• Huan Zhang's avatar
    Initial GPU acceleration support for LightGBM (#368) · 0bb4a825
    Huan Zhang authored
    * add dummy gpu solver code
    
    * initial GPU code
    
    * fix crash bug
    
    * first working version
    
    * use asynchronous copy
    
    * use a better kernel for root
    
    * parallel read histogram
    
    * sparse features now works, but no acceleration, compute on CPU
    
    * compute sparse feature on CPU simultaneously
    
    * fix big bug; add gpu selection; add kernel selection
    
    * better debugging
    
    * clean up
    
    * add feature scatter
    
    * Add sparse_threshold control
    
    * fix a bug in feature scatter
    
    * clean up debug
    
    * temporarily add OpenCL kernels for k=64,256
    
    * fix up CMakeList and definition USE_GPU
    
    * add OpenCL kernels as string literals
    
    * Add boost.compute as a submodule
    
    * add boost dependency into CMakeList
    
    * fix opencl pragma
    
    * use pinned memory for histogram
    
    * use pinned buffer for gradients and hessians
    
    * better debugging message
    
    * add double precision support on GPU
    
    * fix boost version in CMakeList
    
    * Add a README
    
    * reconstruct GPU initialization code for ResetTrainingData
    
    * move data to GPU in parallel
    
    * fix a bug during feature copy
    
    * update gpu kernels
    
    * update gpu code
    
    * initial port to LightGBM v2
    
    * speedup GPU data loading process
    
    * Add 4-bit bin support to GPU
    
    * re-add sparse_threshold parameter
    
    * remove kMaxNumWorkgroups and allows an unlimited number of features
    
    * add feature mask support for skipping unused features
    
    * enable kernel cache
    
    * use GPU kernels withoug feature masks when all features are used
    
    * REAdme.
    
    * REAdme.
    
    * update README
    
    * fix typos (#349)
    
    * change compile to gcc on Apple as default
    
    * clean vscode related file
    
    * refine api of constructing from sampling data.
    
    * fix bug in the last commit.
    
    * more efficient algorithm to sample k from n.
    
    * fix bug in filter bin
    
    * change to boost from average output.
    
    * fix tests.
    
    * only stop training when all classes are finshed in multi-class.
    
    * limit the max tree output. change hessian in multi-class objective.
    
    * robust tree model loading.
    
    * fix test.
    
    * convert the probabilities to raw score in boost_from_average of classification.
    
    * fix the average label for binary classification.
    
    * Add boost_from_average to docs (#354)
    
    * don't use "ConvertToRawScore" for self-defined objective function.
    
    * boost_from_average seems doesn't work well in binary classification. remove it.
    
    * For a better jump link (#355)
    
    * Update Python-API.md
    
    * for a better jump in page
    
    A space is needed between `#` and the headers content according to Github's markdown format [guideline](https://guides.github.com/features/mastering-markdown/)
    
    After adding the spaces, we can jump to the exact position in page by click the link.
    
    * fixed something mentioned by @wxchan
    
    * Update Python-API.md
    
    * add FitByExistingTree.
    
    * adapt GPU tree learner for FitByExistingTree
    
    * avoid NaN output.
    
    * update boost.compute
    
    * fix typos (#361)
    
    * fix broken links (#359)
    
    * update README
    
    * disable GPU acceleration by default
    
    * fix image url
    
    * cleanup debug macro
    
    * remove old README
    
    * do not save sparse_threshold_ in FeatureGroup
    
    * add details for new GPU settings
    
    * ignore submodule when doing pep8 check
    
    * allocate workspace for at least one thread during builing Feature4
    
    * move sparse_threshold to class Dataset
    
    * remove duplicated code in GPUTreeLearner::Split
    
    * Remove duplicated code in FindBestThresholds and BeforeFindBestSplit
    
    * do not rebuild ordered gradients and hessians for sparse features
    
    * support feature groups in GPUTreeLearner
    
    * Initial parallel learners with GPU support
    
    * add option device, cleanup code
    
    * clean up FindBestThresholds; add some omp parallel
    
    * constant hessian optimization for GPU
    
    * Fix GPUTreeLearner crash when there is zero feature
    
    * use np.testing.assert_almost_equal() to compare lists of floats in tests
    
    * travis for GPU
    0bb4a825
config.cpp 14.9 KB