- 19 Feb, 2020 1 commit
-
-
Guolin Ke authored
* reset * fix a bug * fix test * Update c_api.h * support to no filter features by min_data * add warning in reset config * refine warnings for override dataset's parameter * some cleans * clean code * clean code * refine C API function doxygen comments * refined new param description * refined doxygen comments for R API function * removed stuff related to int8 * break long line in warning message * removed tests which results cannot be validated anymore * added test for warnings about unchangeable params * write parameter from dataset to booster * consider free_raw_data. * fix params * fix bug * implementing R * fix typo * filter params in R * fix R * not min_data * refined tests * fixed linting * refine * pilint * add docstring * fix docstring * R lint * updated description for C API function * use param aliases in Python * fixed typo * fixed typo * added more params to test * removed debug print * fix dataset construct place * fix merge bug * Update feature_histogram.hpp * add is_sparse back * remove unused parameters * fix lint * add data random seed * update * [R-package] centrallized Dataset parameter aliases and added tests on Dataset parameter updating (#2767) Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
James Lamb <jaylamb20@gmail.com>
-
- 14 Jan, 2020 1 commit
-
-
Nikita Titov authored
* transfer and enhance test for trees_to_dataframe * fixed bug in Python 2
-
- 10 Jan, 2020 1 commit
-
-
Patrick Ford authored
* trees_to_df method and unit test added. PEP 8 fixes for integration. * Co-Authored-By: Nikita Titov <nekit94-08@mail.ru> Post-review changes * changes from second round of reviews from striker * third round of review. formatting and added 2 more tests * replaced pandas dot attribute accessor with string attribute accessor * dealt with single tree edge case and minor refactor of tests * slight refactor for checking if tree is a single node
-
- 27 Oct, 2019 2 commits
-
-
Nikita Titov authored
* speed up tests * more updates * fixed pylint * updated tests * Update test_sklearn.py * test that indices are sorted internally
-
Nikita Titov authored
-
- 03 Oct, 2019 1 commit
-
-
Guolin Ke authored
* check the shape for mat, csr and csc * guess from csr * support file checking * better error msg * grammar * clean code * code clean * check range for CSR * Update test_.py * Update test_.py * added tests
-
- 26 Sep, 2019 1 commit
-
-
Nikita Titov authored
* make dump_text() private * updated test
-
- 09 Sep, 2019 1 commit
-
-
Nikita Titov authored
* keep consistent state for Dataset fields * hotfix
-
- 20 Jun, 2019 1 commit
-
-
Nikita Titov authored
* Update test.py * Update test_consistency.py * Update test_basic.py * Update test_sklearn.py * Update test_sklearn.py * Update test_engine.py * more replacements
-
- 04 Apr, 2019 1 commit
-
-
remcob-gr authored
* Add configuration parameters for CEGB. * Add skeleton CEGB tree learner Like the original CEGB version, this inherits from SerialTreeLearner. Currently, it changes nothing from the original. * Track features used in CEGB tree learner. * Pull CEGB tradeoff and coupled feature penalty from config. * Implement finding best splits for CEGB This is heavily based on the serial version, but just adds using the coupled penalties. * Set proper defaults for cegb parameters. * Ensure sanity checks don't switch off CEGB. * Implement per-data-point feature penalties in CEGB. * Implement split penalty and remove unused parameters. * Merge changes from CEGB tree learner into serial tree learner * Represent features_used_in_data by a bitset, to reduce the memory overhead of CEGB, and add sanity checks for the lengths of the penalty vectors. * Fix bug where CEGB would incorrectly penalise a previously used feature The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree. This caused it to prefer new features due to incorrectly penalising splitting on previously used features. * Document CEGB parameters and add them to the appropriate section. * Remove leftover reference to cegb tree learner. * Remove outdated diff. * Fix warnings * Fix minor issues identified by @StrikerRUS. * Add docs section on CEGB, including citation. * Fix link. * Fix CI failure. * Add some unit tests * Fix pylint issues. * Fix remaining pylint issue
-
- 07 Mar, 2019 1 commit
-
-
Nikita Titov authored
* fixed number of tests in pytest * fixed data shape and removed unused code * refactored tests * hotfix * hotfix
-
- 26 Feb, 2019 1 commit
-
-
remcob-gr authored
* Initial attempt to implement appending features in-memory to another data set The intent is for this to enable munging files together easily, without needing to round-trip via numpy or write multiple copies to disk. In turn, that enables working more efficiently with data sets that were written separately. * Implement Dataset.dump_text, and fix small bug in appending of group bin boundaries. Dumping to text enables us to compare results, without having to worry about issues like features being reordered. * Add basic tests for validation logic for add_features_from. * Remove various internal mapping items from dataset text dumps These are too sensitive to the exact feature order chosen, which is not visible to the user. Including them in tests appears unnecessary, as the data dumping code should provide enough coverage. * Add test that add_features_from results in identical data sets according to dump_text. * Add test that booster behaviour after using add_features_from matches that of training on the full data This checks: - That training after add_features_from works at all - That add_features_from does not cause training to misbehave * Expose feature_penalty and monotone_types/constraints via get_field These getters allow us to check that add_features_from does the right thing with these vectors. * Add tests that add_features correctly handles feature_penalty and monotone_constraints. * Ensure add_features_from properly frees the added dataset and add unit test for this Since add_features_from moves the feature group pointers from the added dataset to the dataset being added to, the added dataset is invalid after the call. We must ensure we do not try and access this handle. * Remove some obsolete TODOs * Tidy up DumpTextFile by using a single iterator for each feature This iterators were also passed around as raw pointers without being freed, which is now fixed. * Factor out offsetting logic in AddFeaturesFrom * Remove obsolete TODO * Remove another TODO This one is debatable, test code can be a bit messy and duplicate-heavy, factoring it out tends to end badly. Leaving this for now, will revisit if adding more tests later on becomes a mess. * Add documentation for newly-added methods. * Fix whitespace issues identified by pylint. * Fix a few more whitespace issues. * Fix doc comments * Implement deep copying for feature groups. * Replace awkward std::move usage by emplace_back, and reduce vector size to num_features rather than num_total_features. * Copy feature groups in addFeaturesFrom, rather than moving them. * Fix bugs in FeatureGroup copy constructor and ensure source dataset remains usable * Add reserve to PushVector and PushOffset * Move definition of Clone into class body * Fix PR review issues * Fix for loop increment style. * Fix test failure * Some more docstring fixes. * Remove blank line
-
- 11 Oct, 2018 1 commit
-
-
Nikita Titov authored
* break huge lines in sklearn tests * break huge line in plotting tests * break huge lines in basic tests * multiple enhancements in engine tests * multiple enhancements in sklearn tests * hotfixes * break huge lines and use with statement in C API test * make NDCG test more strict
-
- 10 Sep, 2018 1 commit
-
-
Nikita Titov authored
-
- 27 Aug, 2018 1 commit
-
-
Nikita Titov authored
* added NumberOfTotalModel and NumModelPerIteration to C_API and python-package * fixed tests * added tests for current_iteration, num_trees, num_model_per_iteration methods * break huge line in test * hotfix
-
- 07 Jul, 2018 1 commit
-
-
Fedor Korotkiy authored
-
- 20 Jun, 2018 1 commit
-
-
Nikita Titov authored
* removed excess import * added tests for plotting trees in Python * refined module_INSTALLED mechanism * added note about that create_tree_digraph is better than plot_tree
-
- 10 May, 2018 1 commit
-
-
Nikita Titov authored
* fixed docs * reworker predict method of sklearn wrapper * fixed encapsulation * added test * fixed consistency between docstring and params docs * fixed verbose * replaced predict_proba with predict in test * fixed verbose again * fixed fraction params descriptions * added description of skip_drop and drop_rate constraints * fixed subsample_freq consistency with C++ default value * fixed nice look of params list * made force splits json file example clickable * fixed nice look of metrics list and added comma * reduced warning in test about same param specified twice * replaced pred_parameter with **kwargs in predict method * added test for **kwargs in predict method * fixed warnings * fixed pylint
-
- 26 Nov, 2017 1 commit
-
-
Guolin Ke authored
* remove protobuf * add version number * remove pmml script * use float for split gain * fix warnings * refine the read model logic of gbdt * fix compile error * improve decode speed * fix some bugs * fix double accuracy problem * fix bug * multi-thread save model * speed up save model to string * parallel save/load model * fix some warnings. * fix warnings. * fix a bug * remove debug output * fix doc * fix max_bin warning in tests. * fix max_bin warning * fix pylint * clean code for stringToArray * clean code for TToString * remove max_bin * replace "class" with typename
-
- 18 Aug, 2017 1 commit
-
-
Guolin Ke authored
-
- 30 May, 2017 1 commit
-
-
Guolin Ke authored
* fix multi-threading. * fix name style. * support in CLI version. * remove warnings. * Not default parameters. * fix if...else... . * fix bug. * fix warning. * refine c_api. * fix R-package. * fix R's warning. * fix tests. * fix pep8 .
-
- 29 May, 2017 1 commit
-
-
cbecker authored
* Add early stopping for prediction * Fix GBDT if-else prediction with early stopping * Small C++ embelishments to early stopping API and functions * Fix early stopping efficiency issue by creating a singleton for no early stopping * Python improvements to early stopping API * Add assertion check for binary and multiclass prediction score length * Update vcxproj and vcxproj.filters with new early stopping files * Remove inline from PredictRaw(), the linker was not able to find it otherwise
-
- 11 May, 2017 1 commit
-
-
Tsukasa OMOTO authored
https://docs.pytest.org/
-
- 13 Apr, 2017 1 commit
-
-
Huan Zhang authored
-
- 01 Mar, 2017 2 commits
- 09 Jan, 2017 1 commit
-
-
wxchan authored
* add pmml to test * refine pmml.py * use ~n instead of -n-1 * change map to list comprehension * fix check * fix 'use ~n instead of -n-1' * fix exception
-
- 04 Jan, 2017 3 commits
-
-
Guolin Ke authored
-
wxchan authored
* format python code with pep8 * **DO NOT MERGE** deliberately break rules to see what will happen during check * Revert "**DO NOT MERGE** deliberately break rules to see what will happen during check" This reverts commit 0db93cd7a43c7efa43a2112ada43d46c6f9115d9. * fix format in test.py * add docs for pep-8
-
Guolin Ke authored
-
- 02 Jan, 2017 2 commits
- 01 Jan, 2017 1 commit
-
-
wxchan authored
* support pickle * add pickle/joblib test; change test_basic to unittest * remove file for deepcopy * fix tests * test basic predict from file * Revert "test basic predict from file" This reverts commit 60d2c3158537fd56081f60f1d6d120cedd782887. * test predict from file * use tempfile for copy & pickle * use tempfile w/o binary mode * clean test
-
- 08 Dec, 2016 1 commit
-
-
Guolin Ke authored
Provide a high level Dataset class for easy use.
-
- 05 Dec, 2016 1 commit
-
-
Guolin Ke authored
Categorical feature support (#108)
-
- 02 Dec, 2016 1 commit
-
-
wxchan authored
1. merge python-package 2. add dump model to json 3. fix bugs 4. clean code with pylint 5. update python examples
-
- 30 Nov, 2016 2 commits
- 24 Nov, 2016 1 commit
-
-
Guolin Ke authored
-