- 23 Feb, 2021 1 commit
-
-
Belinda Trotta authored
* Update docs to note that pred_contrib is not available for linear trees * Add warning in code * Change warning to error
-
- 21 Feb, 2021 1 commit
-
-
mjmckp authored
* Fix index out-of-range exception generated by BaggingHelper on small datasets. Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero. * Update goss.hpp * Update goss.hpp * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array) * Fix incorrect upstream merge * Add link to LightGBM.NET * Fix indenting to 2 spaces * Dummy edit to trigger CI * Dummy edit to trigger CI * remove duplicate functions from merge * Fix evalution of linear trees with a single leaf. Note that trees without linear models at the leaf always handle num_leaves = 1 as a special case and directly output the leaf value. Linear trees were missing this special case handling, and hence would have the following issues: * Calling Tree::Predict or Tree::PredictByMap would cause an access violation exception attempting to access the first value of the empty split_feature_ array in GetLeaf. * PredictionFunLinear would either cause an access violation or go into an infinite loop when attempting to do the equivalent of GetLeaf. Note also that PredictionFun does not need the same changes as PredictionFunLinear, since both are only called by Tree::AddPredictionToScore, which has a special case for (!is_linear_ && num_leaves_ <= 1) that precludes calling PredictionFun. Co-authored-by:
matthew-peacock <matthew.peacock@whiteoakam.com> Co-authored-by:
Guolin Ke <guolin.ke@outlook.com>
-
- 19 Feb, 2021 1 commit
-
-
James Lamb authored
* [docs] Change some 'parallel learning' references to 'distributed learning' * found a few more * one more reference
-
- 13 Feb, 2021 3 commits
-
-
Belinda Trotta authored
* Update docs about linear tree and monotone constraints * Fix punctuation
-
Benjamin Sergeant authored
* openmp_wrapper.h stubs signature use __GOMP_NOTHROW Fix #3915. OpenMP stubs do not use the noexcept attibute which is present in the gcc version of openmp, and which trigger compilation errors as seen below. dmlc-core uses the same technique and macro. /usr/lib/gcc/x86_64-linux-gnu/9/include/omp.h:114:12: error: declaration of ‘int omp_get_thread_num() noexcept’ has a different exception specifier 114 | extern int omp_get_thread_num (void) __GOMP_NOTHROW; | ^~~~~~~~~~~~~~~~~~ ... xxx/include/LightGBM/utils/openmp_wrapper.h:81:14: note: from previous declaration ‘int omp_get_thread_num()’ 81 | inline int omp_get_thread_num() {return 0;} | ^~~~~~~~~~~~~~~~~~ * move __GOMP_NOTHROW definition in the no open mp stub conditional branch * Update include/LightGBM/utils/openmp_wrapper.h Yes make sense, just changed it Co-authored-by:James Lamb <jaylamb20@gmail.com> * add NOLINT macro to disable cpplint on a safe line of code Co-authored-by:
James Lamb <jaylamb20@gmail.com>
-
mjmckp authored
Fix access violation exception that can occur during invocation of loop lambda function when inner_start >= inner_end in 'For' template (#3936) * Fix index out-of-range exception generated by BaggingHelper on small datasets. Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero. * Update goss.hpp * Update goss.hpp * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array) * Fix incorrect upstream merge * Add link to LightGBM.NET * Fix indenting to 2 spaces * Dummy edit to trigger CI * Dummy edit to trigger CI * remove duplicate functions from merge * Fix access violation exception that can occur during invocation of loop lambda function when inner_start >= inner_end in 'For' template. In particular, this can occur in Tree::AddPredictionToScore on line 291 where the loop lambda function body (created by the PredictionFun macro) dereferences used_data_indices[start]. For reference, the particular case which triggered this exception in my case was: * start = 0 * end = 93,203 * n_block = 56 * min_block_size = 512 for which the BlockInfo method gave: * n_block = 56 * num_inner = 1,696 and so, for the case i=55 (i.e., the last case in the loop), we get * inner_start = start + num_inner * i = 93,280 which is greater than 'end' and hence triggers the exception. * Change formatting of proposed modification Co-authored-by:matthew-peacock <matthew.peacock@whiteoakam.com> Co-authored-by:
Guolin Ke <guolin.ke@outlook.com>
-
- 03 Feb, 2021 1 commit
-
-
Chen Yufei authored
* Add new task type: "save_binary". * Document for task "save_binary".
-
- 31 Jan, 2021 1 commit
-
-
Nikita Titov authored
* document CUDA version support * address review comments * collapse CUDA section in the guide * remove Clang support from CUDA docs as we have never tested it
-
- 28 Jan, 2021 1 commit
-
-
Nikita Titov authored
-
- 25 Jan, 2021 1 commit
-
-
shiyu1994 authored
-
- 21 Jan, 2021 1 commit
-
-
Nikita Titov authored
-
- 18 Jan, 2021 2 commits
-
-
James Lamb authored
* [python-package] expand documentation on 'group' for ranking task * add R package * update Query Data section * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * fix typo in group example * regenerate parameters * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * regenerate R docs Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
James Lamb authored
* [R-package] enable use of trees with linear models at leaves (fixes #3319) * remove problematic pragmas * fix tests * try to fix build scripts * try fixing pragma check * more pragma checks * ok fix pragma stuff for real * empty commit * regenerate documentation * try skipping test * uncomment CI * add note on missing value types for R * add tests on saving and re-loading booster
-
- 11 Jan, 2021 1 commit
-
-
Chip Kerchner authored
-
- 07 Jan, 2021 1 commit
-
-
Belinda Trotta authored
* Fix compiler warnings caused by implicit type conversion * Fix more warnings * Fix more warnings
-
- 03 Jan, 2021 1 commit
-
-
sisco0 authored
Compile warnings have been fixed
-
- 29 Dec, 2020 1 commit
-
-
James Lamb authored
* [docs] add doc on min_data_in_leaf approximation (fixes #3634) * Fix capital letter Co-authored-by:Nikita Titov <nekit94-08@mail.ru>
-
- 28 Dec, 2020 1 commit
-
-
Nikita Titov authored
* small code and docs refactoring * Update CMakeLists.txt * Update .vsts-ci.yml * Update test.sh * continue * continue * revert stable sort for all-unique values
-
- 24 Dec, 2020 1 commit
-
-
Belinda Trotta authored
* Add Eigen library. * Working for simple test. * Apply changes to config params. * Handle nan data. * Update docs. * Add test. * Only load raw data if boosting=gbdt_linear * Remove unneeded code. * Minor updates. * Update to work with sk-learn interface. * Update to work with chunked datasets. * Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters. * Save raw data in binary dataset file. * Update docs and fix parameter checking. * Fix dataset loading. * Add test for regularization. * Fix bugs when saving and loading tree. * Add test for load/save linear model. * Remove unneeded code. * Fix case where not enough leaf data for linear model. * Simplify code. * Speed up code. * Speed up code. * Simplify code. * Speed up code. * Fix bugs. * Working version. * Store feature data column-wise (not fully working yet). * Fix bugs. * Speed up. * Speed up. * Remove unneeded code. * Small speedup. * Speed up. * Minor updates. * Remove unneeded code. * Fix bug. * Fix bug. * Speed up. * Speed up. * Simplify code. * Remove unneeded code. * Fix bug, add more tests. * Fix bug and add test. * Only store numerical features * Fix bug and speed up using templates. * Speed up prediction. * Fix bug with regularisation * Visual studio files. * Working version * Only check nans if necessary * Store coeff matrix as an array. * Align cache lines * Align cache lines * Preallocation coefficient calculation matrices * Small speedups * Small speedup * Reverse cache alignment changes * Change to dynamic schedule * Update docs. * Refactor so that linear tree learner is not a separate class. * Add refit capability. * Speed up * Small speedups. * Speed up add prediction to score. * Fix bug * Fix bug and speed up. * Speed up dataload. * Speed up dataload * Use vectors instead of pointers * Fix bug * Add OMP exception handling. * Change return type of LGBM_BoosterGetLinear to bool * Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change * Remove unused internal_parent_ property of tree * Remove unused parameter to CreateTreeLearner * Remove reference to LinearTreeLearner * Minor style issues * Remove unneeded check * Reverse temporary testing change * Fix Visual Studio project files * Restore LightGBM.vcxproj.filters * Speed up * Speed up * Simplify code * Update docs * Simplify code * Initialise storage space for max num threads * Move Eigen to include directory and delete unused files * Remove old files. * Fix so it compiles with mingw * Fix gpu tree learner * Change AddPredictionToScore back to const * Fix python lint error * Fix C++ lint errors * Change eigen to a submodule * Update comment * Add the eigen folder * Try to fix build issues with eigen * Remove eigen files * Add eigen as submodule * Fix include paths * Exclude eigen files from Python linter * Ignore eigen folders for pydocstyle * Fix C++ linting errors * Fix docs * Fix docs * Exclude eigen directories from doxygen * Update manifest to include eigen * Update build_r to include eigen files * Fix compiler warnings * Store raw feature data as float * Use float for calculating linear coefficients * Remove eigen directory from GLOB * Don't compile linear model code when building R package * Fix doxygen issue * Fix lint issue * Fix lint issue * Remove uneeded code * Restore delected lines * Restore delected lines * Change return type of has_raw to bool * Update docs * Rename some variables and functions for readability * Make tree_learner parameter const in AddScore * Fix style issues * Pass vectors as const reference when setting tree properties * Make temporary storage of serial_tree_learner mutable so we can make the object's methods const * Remove get_raw_size, use num_numeric_features instead * Fix typo * Make contains_nan_ and any_nan_ properties immutable again * Remove data_has_nan_ property of tree * Remove temporary test code * Make linear_tree a dataset param * Fix lint error * Make LinearTreeLearner a separate class * Fix lint errors * Fix lint error * Add linear_tree_learner.o * Simulate omp_get_max_threads if openmp is not available * Update PushOneData to also store raw data. * Cast size to int * Fix bug in ReshapeRaw * Speed up code with multithreading * Use OMP_NUM_THREADS * Speed up with multithreading * Update to use ArrayToString * Fix tests * Fix test * Fix bug introduced in merge * Minor updates * Update docs
-
- 11 Dec, 2020 1 commit
-
-
James Lamb authored
* [docs] Add details to docs on improving training speed * formatting * fix link * fix formatting * replace 'performance' with 'accuracy' and mention learning_rate * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * regenerate docs from config.h Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 08 Dec, 2020 1 commit
-
-
Alberto Ferreira authored
* Fix LightGBM models locale sensitivity and improve R/W performance. When Java is used, the default C++ locale is broken. This is true for Java providers that use the C API or even Python models that require JEP. This patch solves that issue making the model reads/writes insensitive to such settings. To achieve it, within the model read/write codebase: - C++ streams are imbued with the classic locale - Calls to functions that are dependent on the locale are replaced - The default locale is not changed! This approach means: - The user's locale is never tampered with, avoiding issues such as https://github.com/microsoft/LightGBM/issues/2979 with the previous approach https://github.com/microsoft/LightGBM/pull/2891 - Datasets can still be read according the user's locale - The model file has a single format independent of locale Changes: - Add CommonC namespace which provides faster locale-independent versions of Common's methods - Model code makes conversions through CommonC - Cleanup unused Common methods - Performance improvements. Use fast libraries for locale-agnostic conversion: - value->string: https://github.com/fmtlib/fmt - string->double: https://github.com/lemire/fast_double_parser (10x faster double parsing according to their benchmark) Bugfixes: - https://github.com/microsoft/LightGBM/issues/2500 - https://github.com/microsoft/LightGBM/issues/2890 - https://github.com/ninia/jep/issues/205 (as it is related to LGBM as well) * Align CommonC namespace * Add new external_libs/ to python setup * Try fast_double_parser fix #1 Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe If it works it should fix more LGBM builds * CMake: Attempt to link fmt without explicit PUBLIC tag * Exclude external_libs from linting * Add exernal_libs to MANIFEST.in * Set dynamic linking option for fmt. * linting issues * Try to fix lint includes * Try to pass fPIC with static fmt lib * Try CMake P_I_C option with fmt library * [R-package] Add CMake support for R and CRAN * Cleanup CMakeLists * Try fmt hack to remove stdout * Switch to header-only mode * Add PRIVATE argument to target_link_libraries * use fmt in header-only mode * Remove CMakeLists comment * Change OpenMP to PUBLIC linking in Mac * Update fmt submodule to 7.1.2 * Use fmt in header-only-mode * Remove fmt from CMakeLists.txt * Upgrade fast_double_parser to v0.2.0 * Revert "Add PRIVATE argument to target_link_libraries" This reverts commit 3dd45dde7b92531b2530ab54522bb843c56227a7. * Address James Lamb's comments * Update R-package/.Rbuildignore Co-authored-by:James Lamb <jaylamb20@gmail.com> * Upgrade to fast_double_parser v0.3.0 - Solaris support * Use legacy code only in Solaris * Fix lint issues * Fix comment * Address StrikerRUS's comments (solaris ifdef). * Change header guards Co-authored-by:
James Lamb <jaylamb20@gmail.com>
-
- 05 Dec, 2020 1 commit
-
-
Chen Yufei authored
* Check max_bin, etc. match config when using binary. * Check max_bin_by_feature, bin_construct_sample_cnt matching config.
-
- 24 Nov, 2020 1 commit
-
-
shiyu1994 authored
Fix num_total_bin_ and bin_offsets_ of FeatureGroup if a dense multi val feature group with non zero most freq bin is the first feature group of the dataset.
-
- 23 Nov, 2020 1 commit
-
-
shiyu1994 authored
* remove max_block_size_ in train states (fix #3570) * avoid zero elements per row * add min constraint for min_block_size_
-
- 15 Nov, 2020 1 commit
-
-
Nikita Titov authored
-
- 14 Nov, 2020 2 commits
- 13 Nov, 2020 1 commit
-
-
shiyu1994 authored
* store without offset in multi_val_dense_bin * fix offset bug * add comment for offset * add comment for bin type selection * faster operations for offset * keep most freq bin in histogram for multi val dense * use original feature iterators * consider 9 cases (3 x 3) for multi val bin construction * fix dense bin setting * fix bin data in multi val group * fix offset of the first feature histogram * use float hist buf * avx in histogram construction * use avx for hist construction without prefetch * vectorize bin extraction * use only 128 vec * use avx2 * use vectorization for sparse row wise * add bit size for multi val dense bin * float with no vectorization * change multithreading strategy to dynamic * remove intrinsic header * fix dense multi val col copy * remove bit size * use large enough block size when the bin number is large * calc min block size by sparsity * rescale gradients * rollback gradients scaling * single precision histogram buffer as an option * add float hist buffer with thread buffer * fix setting zero in hist data * fix hist begin pointer in tree learners * remove debug logs * remove omp simd * update Makevars of R-package * fix feature group binary storing * two row wise for double hist buffer * add subfeature for two row wise * remove useless code and fix two row wise * refactor code * grouping the dense feature groups can get sparse multi val bin * clean format problems * one thread for two blocks in sep row wise * use ordered gradients for sep row wise * fix grad ptr * ordered grad with combined block for sep row wise * fix block threading * use the same min block size * rollback share min block size * remove logs * Update src/io/dataset.cpp Co-authored-by:
Guolin Ke <guolin.ke@outlook.com> * fix parameter description * remove sep_row_wise * remove check codes * add check for empty multi val bin * fix lint error * rollback changes in config.h * Apply suggestions from code review Co-authored-by:
Ubuntu <shiyu@gbdt-04.ren3kv4wanvufliwrpy4k03lsf.xx.internal.cloudapp.net> Co-authored-by:
Guolin Ke <guolin.ke@outlook.com>
-
- 06 Nov, 2020 1 commit
-
-
Guolin Ke authored
* better document for bin_construct_sample_cnt * add warnings Co-authored-by:StrikerRUS <nekit94-12@hotmail.com>
-
- 01 Nov, 2020 1 commit
-
-
Guolin Ke authored
* implement * fix compilation * Update config.cpp * unify wordings Co-authored-by:StrikerRUS <nekit94-12@hotmail.com>
-
- 28 Oct, 2020 1 commit
-
-
Nikita Titov authored
-
- 27 Oct, 2020 3 commits
-
-
Guolin Ke authored
* rollback omp sum * remove sum reduction
-
Guolin Ke authored
* speed up multi-threading sum * Apply suggestions from code review
-
Pavel Metrikov authored
* Add support to optimize for NDCG at a given truncation level In order to correctly optimize for NDCG@_k_, one should exclude pairs containing both documents beyond the top-_k_ (as they don't affect NDCG@_k_ when swapped). * Update rank_objective.hpp * Apply suggestions from code review Co-authored-by:
Guolin Ke <guolin.ke@outlook.com> * Update rank_objective.hpp remove the additional branching: get high_rank and low_rank by one "if". * Update config.h add description to lambdarank_truncation_level parameter * Update Parameters.rst * Update test_sklearn.py update expected NDCG value for a test, as it was affected by the underlying change in the algorithm * Update test_sklearn.py update NDCG@3 reference value * fix R learning-to-rank tests * Update rank_objective.hpp * Update include/LightGBM/config.h Co-authored-by:
Guolin Ke <guolin.ke@outlook.com> * Update Parameters.rst Co-authored-by:
Guolin Ke <guolin.ke@outlook.com> Co-authored-by:
James Lamb <jaylamb20@gmail.com>
-
- 26 Oct, 2020 2 commits
-
-
Guolin Ke authored
* fix subset bug * typo * add fixme tag * bin mapper * fix test * fix add_features_from * Update dataset.cpp * fix merge bug * added Python merge code * added test for add_features * Update dataset.cpp * Update src/io/dataset.cpp * continue implementing * warn users about categorical features Co-authored-by:
StrikerRUS <nekit94-12@hotmail.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
Pengfei Shi authored
-
- 18 Oct, 2020 1 commit
-
-
James Lamb authored
* fix int64 write error * attempt * [WIP] [ci] [R-package] Add CI job that runs valgrind tests * update all-successful * install * executable * fix redirect stuff * Apply suggestions from code review Co-authored-by:
Guolin Ke <guolin.ke@outlook.com> * more flags * add mc to msvc proj * fix memory leak in mc * Update monotone_constraints.hpp * Update r_package.yml * remove R_INT64_PTR * disable openmp * Update gbdt_model_text.cpp * Update gbdt_model_text.cpp * Apply suggestions from code review * try to free vector * free more memories. * Update src/boosting/gbdt_model_text.cpp * fix using * try the UNPROTECT(1); * fix a const pointer * fix Common * reduce UNPROTECT * remove UNPROTECT(1); * fix null handle * fix predictor * use NULL after free * fix a leaking in test * try more fixes * test the effect of tests * throw exception in Fatal * add test back * Apply suggestions from code review * commet some tests * Apply suggestions from code review * Apply suggestions from code review * trying to comment out tests * Update openmp_wrapper.h * Apply suggestions from code review * Update configure * Update configure.ac * trying to uncomment * more comments * more uncommenting * more uncommenting * fix comment * more uncommenting * uncomment fully-commented out stuff * try uncommenting more dataset tests * uncommenting more tests * ok getting closer * more uncommenting * free dataset * skipping a test, more uncommenting * more skipping * re-enable OpenMP * allow on OpenMP thing * move valgrind to comment-only job * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * changes from code review * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * linting * issue comments too * remove issue_comment Co-authored-by:
Guolin Ke <guolin.ke@outlook.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 17 Oct, 2020 1 commit
-
-
Aakarsh Gopi authored
This allows for network retries, to scale well with the number of machines, and still retains the existing functionality for cases with smaller num_machines ( 500 ) Fixes #3301 Co-authored-by:Aakarsh Gopi <aakarsh@vaticlabs.com>
-
- 09 Oct, 2020 1 commit
-
-
Lucas David authored
~ Added 'noexcept' specifier and defaulted desctructor. Co-authored-by:Lucas DAVID <lucas@isdom.isoft.fr>
-
- 30 Sep, 2020 1 commit
-
-
Guolin Ke authored
* Update serial_tree_learner.cpp * Update src/treelearner/serial_tree_learner.cpp * stable multi-threading reduction * Update src/treelearner/serial_tree_learner.cpp * more fixes * Apply suggestions from code review * Apply suggestions from code review * Update src/boosting/gbdt.cpp
-