- 28 Dec, 2022 1 commit
-
-
Yifei Liu authored
* add parameter data_sample_strategy * abstract GOSS as a sample strategy(GOSS1), togetherwith origial GOSS (Normal Bagging has not been abstracted, so do NOT use it now) * abstract Bagging as a subclass (BAGGING), but original Bagging members in GBDT are still kept * fix some variables * remove GOSS(as boost) and Bagging logic in GBDT * rename GOSS1 to GOSS(as sample strategy) * add warning about use GOSS as boosting_type * a little ; bug * remove CHECK when "gradients != nullptr" * rename DataSampleStrategy to avoid confusion * remove and add some ccomments, followingconvention * fix bug about GBDT::ResetConfig (ObjectiveFunction inconsistencty bet… * add std::ignore to avoid compiler warnings (anpotential fails) * update Makevars and vcxproj * handle constant hessian move resize of gradient vectors out of sample strategy * mark override for IsHessianChange * fix lint errors * rerun parameter_generator.py * update config_auto.cpp * delete redundant blank line * update num_data_ when train_data_ is updated set gradients and hessians when GOSS * check bagging_freq is not zero * reset config_ value merge ResetBaggingConfig and ResetGOSS * remove useless check * add ttests in test_engine.py * remove whitespace in blank line * remove arguments verbose_eval and evals_result * Update tests/python_package_test/test_engine.py reduce num_boost_round Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_engine.py reduce num_boost_round Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_engine.py reduce num_boost_round Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_engine.py reduce num_boost_round Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_engine.py reduce num_boost_round Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_engine.py reduce num_boost_round Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Update src/boosting/sample_strategy.cpp modify warning about setting goss as `boosting_type` Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_engine.py replace load_boston() with make_regression() remove value checks of mean_squared_error in test_sample_strategy_with_boosting() * Update tests/python_package_test/test_engine.py add value checks of mean_squared_error in test_sample_strategy_with_boosting() * Modify warnning about using goss as boosting type * Update tests/python_package_test/test_engine.py add random_state=42 for make_regression() reduce the threshold of mean_square_error * Update src/boosting/sample_strategy.cpp Co-authored-by:
James Lamb <jaylamb20@gmail.com> * remove goss from boosting types in documentation * Update src/boosting/bagging.hpp Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Update src/boosting/bagging.hpp Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Update src/boosting/goss.hpp Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Update src/boosting/goss.hpp Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * rename GOSS with GOSSStrategy * update doc * address comments * fix table in doc * Update include/LightGBM/config.h Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * update documentation * update test case * revert useless change in test_engine.py * add tests for evaluation results in test_sample_strategy_with_boosting * include <string> * change to assert_allclose in test_goss_boosting_and_strategy_equivalent * more tolerance in result checking, due to minor difference in results of gpu versions * change == to np.testing.assert_allclose * fix test case * set gpu_use_dp to true * change --report to --report-level for rstcheck * use gpu_use_dp=true in test_goss_boosting_and_strategy_equivalent * revert unexpected changes of non-ascii characters * revert unexpected changes of non-ascii characters * remove useless changes * allocate gradients_pointer_ and hessians_pointer when necessary * add spaces * remove redundant virtual * include <LightGBM/utils/log.h> for USE_CUDA * check for in test_goss_boosting_and_strategy_equivalent * check for identity in test_sample_strategy_with_boosting * remove cuda option in test_sample_strategy_with_boosting * Update tests/python_package_test/test_engine.py Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Update tests/python_package_test/test_engine.py Co-authored-by:
James Lamb <jaylamb20@gmail.com> * ResetGradientBuffers after ResetSampleConfig * ResetGradientBuffers after ResetSampleConfig * ResetGradientBuffers after bagging * remove useless code * check objective_function_ instead of gradients * enable rf with goss simplify params in test cases * remove useless changes * allow rf with feature subsampling alone * change position of ResetGradientBuffers * check for dask * add parameter types for data_sample_strategy Co-authored-by:
Guangda Liu <v-guangdaliu@microsoft.com> Co-authored-by:
Yu Shi <shiyu_k1994@qq.com> Co-authored-by:
GuangdaLiu <90019144+GuangdaLiu@users.noreply.github.com> Co-authored-by:
James Lamb <jaylamb20@gmail.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 16 Aug, 2022 1 commit
-
-
shiyu1994 authored
* add default definition for GetColWiseData and GetColWiseData * fix warnings of template instantiation * remove files in Makevars and LightGBM.vcxproj
-
- 22 May, 2022 1 commit
-
-
James Lamb authored
-
- 23 Mar, 2022 1 commit
-
-
shiyu1994 authored
* new cuda framework * add histogram construction kernel * before removing multi-gpu * new cuda framework * tree learner cuda kernels * single tree framework ready * single tree training framework * remove comments * boosting with cuda * optimize for best split find * data split * move boosting into cuda * parallel synchronize best split point * merge split data kernels * before code refactor * use tasks instead of features as units for split finding * refactor cuda best split finder * fix configuration error with small leaves in data split * skip histogram construction of too small leaf * skip split finding of invalid leaves stop when no leaf to split * support row wise with CUDA * copy data for split by column * copy data from host to CPU by column for data partition * add synchronize best splits for one leaf from multiple blocks * partition dense row data * fix sync best split from task blocks * add support for sparse row wise for CUDA * remove useless code * add l2 regression objective * sparse multi value bin enabled for CUDA * fix cuda ranking objective * support for number of items <= 2048 per query * speedup histogram construction by interleaving global memory access * split optimization * add cuda tree predictor * remove comma * refactor objective and score updater * before use struct * use structure for split information * use structure for leaf splits * return CUDASplitInfo directly after finding best split * split with CUDATree directly * use cuda row data in cuda histogram constructor * clean src/treelearner/cuda * gather shared cuda device functions * put shared CUDA functions into header file * change smaller leaf from <= back to < for consistent result with CPU * add tree predictor * remove useless cuda_tree_predictor * predict on CUDA with pipeline * add global sort algorithms * add global argsort for queries with many items in ranking tasks * remove limitation of maximum number of items per query in ranking * add cuda metrics * fix CUDA AUC * remove debug code * add regression metrics * remove useless file * don't use mask in shuffle reduce * add more regression objectives * fix cuda mape loss add cuda xentropy loss * use template for different versions of BitonicArgSortDevice * add multiclass metrics * add ndcg metric * fix cross entropy objectives and metrics * fix cross entropy and ndcg metrics * add support for customized objective in CUDA * complete multiclass ova for CUDA * separate cuda tree learner * use shuffle based prefix sum * clean up cuda_algorithms.hpp * add copy subset on CUDA * add bagging for CUDA * clean up code * copy gradients from host to device * support bagging without using subset * add support of bagging with subset for CUDAColumnData * add support of bagging with subset for dense CUDARowData * refactor copy sparse subrow * use copy subset for column subset * add reset train data and reset config for CUDA tree learner add deconstructors for cuda tree learner * add USE_CUDA ifdef to cuda tree learner files * check that dataset doesn't contain CUDA tree learner * remove printf debug information * use full new cuda tree learner only when using single GPU * disable all CUDA code when using CPU version * recover main.cpp * add cpp files for multi value bins * update LightGBM.vcxproj * update LightGBM.vcxproj fix lint errors * fix lint errors * fix lint errors * update Makevars fix lint errors * fix the case with 0 feature and 0 bin fix split finding for invalid leaves create cuda column data when loaded from bin file * fix lint errors hide GetRowWiseData when cuda is not used * recover default device type to cpu * fix na_as_missing case fix cuda feature meta information * fix UpdateDataIndexToLeafIndexKernel * create CUDA trees when needed in CUDADataPartition::UpdateTrainScore * add refit by tree for cuda tree learner * fix test_refit in test_engine.py * create set of large bin partitions in CUDARowData * add histogram construction for columns with a large number of bins * add find best split for categorical features on CUDA * add bitvectors for categorical split * cuda data partition split for categorical features * fix split tree with categorical feature * fix categorical feature splits * refactor cuda_data_partition.cu with multi-level templates * refactor CUDABestSplitFinder by grouping task information into struct * pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder * fix misuse of reference * remove useless changes * add support for path smoothing * virtual destructor for LightGBM::Tree * fix overlapped cat threshold in best split infos * reset histogram pointers in data partition and spllit finder in ResetConfig * comment useless parameter * fix reverse case when na is missing and default bin is zero * fix mfb_is_na and mfb_is_zero and is_single_feature_column * remove debug log * fix cat_l2 when one-hot fix gradient copy when data subset is used * switch shared histogram size according to CUDA version * gpu_use_dp=true when cuda test * revert modification in config.h * fix setting of gpu_use_dp=true in .ci/test.sh * fix linter errors * fix linter error remove useless change * recover main.cpp * separate cuda_exp and cuda * fix ci bash scripts add description for cuda_exp * add USE_CUDA_EXP flag * switch off USE_CUDA_EXP * revert changes in python-packages * more careful separation for USE_CUDA_EXP * fix CUDARowData::DivideCUDAFeatureGroups fix set fields for cuda metadata * revert config.h * fix test settings for cuda experimental version * skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version * fix lint issue by adding a blank line * fix lint errors by resorting imports * fix lint errors by resorting imports * fix lint errors by resorting imports * merge cuda.yml and cuda_exp.yml * update python version in cuda.yml * remove cuda_exp.yml * remove unrelated changes * fix compilation warnings fix cuda exp ci task name * recover task * use multi-level template in histogram construction check split only in debug mode * ignore NVCC related lines in parameter_generator.py * update job name for CUDA tests * apply review suggestions * Update .github/workflows/cuda.yml Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Update .github/workflows/cuda.yml Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * update header * remove useless TODOs * remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062 * #include <LightGBM/utils/log.h> for USE_CUDA_EXP only * fix include order * fix include order * remove extra space * address review comments * add warning when cuda_exp is used together with deterministic * add comment about gpu_use_dp in .ci/test.sh * revert changing order of included headers Co-authored-by:
Yu Shi <shiyu1994@qq.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 24 Dec, 2020 1 commit
-
-
Belinda Trotta authored
* Add Eigen library. * Working for simple test. * Apply changes to config params. * Handle nan data. * Update docs. * Add test. * Only load raw data if boosting=gbdt_linear * Remove unneeded code. * Minor updates. * Update to work with sk-learn interface. * Update to work with chunked datasets. * Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters. * Save raw data in binary dataset file. * Update docs and fix parameter checking. * Fix dataset loading. * Add test for regularization. * Fix bugs when saving and loading tree. * Add test for load/save linear model. * Remove unneeded code. * Fix case where not enough leaf data for linear model. * Simplify code. * Speed up code. * Speed up code. * Simplify code. * Speed up code. * Fix bugs. * Working version. * Store feature data column-wise (not fully working yet). * Fix bugs. * Speed up. * Speed up. * Remove unneeded code. * Small speedup. * Speed up. * Minor updates. * Remove unneeded code. * Fix bug. * Fix bug. * Speed up. * Speed up. * Simplify code. * Remove unneeded code. * Fix bug, add more tests. * Fix bug and add test. * Only store numerical features * Fix bug and speed up using templates. * Speed up prediction. * Fix bug with regularisation * Visual studio files. * Working version * Only check nans if necessary * Store coeff matrix as an array. * Align cache lines * Align cache lines * Preallocation coefficient calculation matrices * Small speedups * Small speedup * Reverse cache alignment changes * Change to dynamic schedule * Update docs. * Refactor so that linear tree learner is not a separate class. * Add refit capability. * Speed up * Small speedups. * Speed up add prediction to score. * Fix bug * Fix bug and speed up. * Speed up dataload. * Speed up dataload * Use vectors instead of pointers * Fix bug * Add OMP exception handling. * Change return type of LGBM_BoosterGetLinear to bool * Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change * Remove unused internal_parent_ property of tree * Remove unused parameter to CreateTreeLearner * Remove reference to LinearTreeLearner * Minor style issues * Remove unneeded check * Reverse temporary testing change * Fix Visual Studio project files * Restore LightGBM.vcxproj.filters * Speed up * Speed up * Simplify code * Update docs * Simplify code * Initialise storage space for max num threads * Move Eigen to include directory and delete unused files * Remove old files. * Fix so it compiles with mingw * Fix gpu tree learner * Change AddPredictionToScore back to const * Fix python lint error * Fix C++ lint errors * Change eigen to a submodule * Update comment * Add the eigen folder * Try to fix build issues with eigen * Remove eigen files * Add eigen as submodule * Fix include paths * Exclude eigen files from Python linter * Ignore eigen folders for pydocstyle * Fix C++ linting errors * Fix docs * Fix docs * Exclude eigen directories from doxygen * Update manifest to include eigen * Update build_r to include eigen files * Fix compiler warnings * Store raw feature data as float * Use float for calculating linear coefficients * Remove eigen directory from GLOB * Don't compile linear model code when building R package * Fix doxygen issue * Fix lint issue * Fix lint issue * Remove uneeded code * Restore delected lines * Restore delected lines * Change return type of has_raw to bool * Update docs * Rename some variables and functions for readability * Make tree_learner parameter const in AddScore * Fix style issues * Pass vectors as const reference when setting tree properties * Make temporary storage of serial_tree_learner mutable so we can make the object's methods const * Remove get_raw_size, use num_numeric_features instead * Fix typo * Make contains_nan_ and any_nan_ properties immutable again * Remove data_has_nan_ property of tree * Remove temporary test code * Make linear_tree a dataset param * Fix lint error * Make LinearTreeLearner a separate class * Fix lint errors * Fix lint error * Add linear_tree_learner.o * Simulate omp_get_max_threads if openmp is not available * Update PushOneData to also store raw data. * Cast size to int * Fix bug in ReshapeRaw * Speed up code with multithreading * Use OMP_NUM_THREADS * Speed up with multithreading * Update to use ArrayToString * Fix tests * Fix test * Fix bug introduced in merge * Minor updates * Update docs
-
- 21 Nov, 2020 1 commit
-
-
James Lamb authored
* [R-package] Remove CLI-only objects * more guards * more guards * variable not string * simplify fix * revert build_r.R changes * move define of global_timer Co-authored-by:Nikita Titov <nekit94-08@mail.ru>
-
- 13 Nov, 2020 1 commit
-
-
shiyu1994 authored
* store without offset in multi_val_dense_bin * fix offset bug * add comment for offset * add comment for bin type selection * faster operations for offset * keep most freq bin in histogram for multi val dense * use original feature iterators * consider 9 cases (3 x 3) for multi val bin construction * fix dense bin setting * fix bin data in multi val group * fix offset of the first feature histogram * use float hist buf * avx in histogram construction * use avx for hist construction without prefetch * vectorize bin extraction * use only 128 vec * use avx2 * use vectorization for sparse row wise * add bit size for multi val dense bin * float with no vectorization * change multithreading strategy to dynamic * remove intrinsic header * fix dense multi val col copy * remove bit size * use large enough block size when the bin number is large * calc min block size by sparsity * rescale gradients * rollback gradients scaling * single precision histogram buffer as an option * add float hist buffer with thread buffer * fix setting zero in hist data * fix hist begin pointer in tree learners * remove debug logs * remove omp simd * update Makevars of R-package * fix feature group binary storing * two row wise for double hist buffer * add subfeature for two row wise * remove useless code and fix two row wise * refactor code * grouping the dense feature groups can get sparse multi val bin * clean format problems * one thread for two blocks in sep row wise * use ordered gradients for sep row wise * fix grad ptr * ordered grad with combined block for sep row wise * fix block threading * use the same min block size * rollback share min block size * remove logs * Update src/io/dataset.cpp Co-authored-by:
Guolin Ke <guolin.ke@outlook.com> * fix parameter description * remove sep_row_wise * remove check codes * add check for empty multi val bin * fix lint error * rollback changes in config.h * Apply suggestions from code review Co-authored-by:
Ubuntu <shiyu@gbdt-04.ren3kv4wanvufliwrpy4k03lsf.xx.internal.cloudapp.net> Co-authored-by:
Guolin Ke <guolin.ke@outlook.com>
-
- 08 Oct, 2020 1 commit
-
-
James Lamb authored
* [R-package] update DESCRIPTION per CRAN comments * newlines * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * more fixes * update Rbuildignore * more changes * more changes per CRAN response * add email * run examples in CI * add newest CRAN response * add Solaris patch * update patch * another attempt at ifaddrs patch * fix unnecessary comment * update configure * comments * bump version * tabs * fix address alignment, required by cran (#3415) * fix dataset binary file alignment * many fixes * fix warnings * fix bug * Update file_io.cpp * Update file_io.cpp * simplify code * Apply suggestions from code review * general * remove unneeded alignment * Update file_io.h * int32 to byte8 alignment * Apply suggestions from code review * Apply suggestions from code review * [R-package] add new copyright holder in DESCRIPTION (#3409) * [R-package] add new copyright holder in DESCRIPTION * fix role * fixing conflicts * [R-package] add new copyright holder in DESCRIPTION (#3409) * [R-package] add new copyright holder in DESCRIPTION * fix role * trying to fix conflicts * more fixes * this will work * update cran-comments * simplify solaris, add more testing docs * stuff * remove rchck docs * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * remove extra use of cat() * change solaris check * update docs * remove testing code * fix warning about cleanup not having execute permissions * fix cmake builds * remove blank line Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
Guolin Ke <guolin.ke@outlook.com>
-
- 29 Jul, 2020 2 commits
-
-
James Lamb authored
Co-authored-by:Nikita Titov <nekit94-08@mail.ru>
-
James Lamb authored
* [R-package] make package installable with CRAN toolchain (fixes #2960) * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * remove GPU stuff * use wildcard to find objects to build * use -lomp * build configure before moving files * using wildcard for objects * Update .github/workflows/main.yml Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * add explicit objects back * reduce allowed R CMD check NOTEs and catch stderr from build-cran-package on Windows * fixing things * pin autoconf version * show diff * add automake back * run less checks * command was in the wrong place * fix autoconf version * change strategy for handling configure * fix Rbuildignore * fix NOTEs * fix notes about unrecognized files * fixing extra files * remove USE_R35 * add OpenMP check for Mac CRAN build * run all checks * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * suggestions from code review * undo indenting * remove 03 from Makevars.win.in * update language about OpenMP in configure script * checking if configure.ac check works * add autoconf back * remove testing code in configure.ac * more fixes for CI on configure script * print git diff * add VERSION.txt when checking configure * fix relative paths * remove git diff Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-