Commits · 99e0a4bd7b9e7c557e593ff9172799822abc4b7d · tianlh / LightGBM-DCU

10 Nov, 2021 1 commit

Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) (#4725) · 33a2f9ec

tongwu-msft authored Nov 10, 2021

* issue fix #4601

* fix issue 4601 it2

* add tests for issue 4601

* fix warning

* fix warning

* add new line at end

* remove last line at end

* fix lint warning

* address comments

* address comments

* address comments

* fix address

* address comments

* revert seed

* fix recursive force split issue

* fix build error

* fix lint warning

33a2f9ec

29 Oct, 2021 1 commit
- Remove checks for label when loading dataset from binary file because label is... · 96ecab6f
  Nikita Titov authored Oct 29, 2021
```
Remove checks for label when loading dataset from binary file because label is ignored in that case (#4737)
```
  96ecab6f
28 Oct, 2021 2 commits
- Reset OpenMP thread number if num_threads <= 0 (#4704) · 42914830
  Zhiyuan He authored Oct 29, 2021
```
* mock func for no openmp

* use omp_get_max_threads
Co-authored-by: hzy46 <email@example.com>
```
  42914830
- Improve warning wordings (#4731) · 765ceadc
  Nikita Titov authored Oct 28, 2021
```
* Update dataset_loader.cpp

* Update dataset_loader.cpp

* Update dataset_loader.cpp
```
  765ceadc
27 Oct, 2021 1 commit
- Add some warnings when loading dataset from binary file (#4724) · 5fbfa00b
  Nikita Titov authored Oct 28, 2021
  
  5fbfa00b
25 Oct, 2021 1 commit
- Fix some paramater hints when loading from binary file (#4701) · dc02dcaf
  Zhiyuan He authored Oct 25, 2021
```
Co-authored-by: hzy46 <email@example.com>
```
  dc02dcaf
20 Oct, 2021 1 commit
- Fix ASAN issues with `std::function` usage (#4673) · 13ed38ca
  david-cortes authored Oct 20, 2021
```
* don't compare std::function to nullptr ref #4633

* Update dataset_loader.h
```
  13ed38ca
13 Oct, 2021 1 commit
- fix behavior for default objective and metric (#4660) · d130bb19
  Nikita Titov authored Oct 13, 2021
  
  d130bb19
08 Oct, 2021 1 commit
- fix possible precision loss in xentropy and fair loss objectives (#4651) · 1c558a54
  James Lamb authored Oct 07, 2021
  
  1c558a54
05 Oct, 2021 3 commits
- remove unused `DCGCalculator::CalDCGAtK()` (#4650) · df8c10ba
  James Lamb authored Oct 05, 2021
  
  df8c10ba
- add param aliases from scikit-learn (#4637) · e95d5ab8
  Nikita Titov authored Oct 05, 2021
  
  e95d5ab8
- remove unused BinMapper::SizeForSpecificBin() (#4643) · e81eaaaf
  James Lamb authored Oct 04, 2021
```
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
```
  e81eaaaf
23 Sep, 2021 2 commits
- move Network method implementations from network.h to network.cpp (fixes #4464) (#4496) · e1572794
  James Lamb authored Sep 22, 2021
  
  e1572794
- simplify and speed up comparisons for splits with identical gains (#4542) · b52ecb16
  James Lamb authored Sep 22, 2021
```
* fix incorrect behavior of SplitInfo == operator for splits with identical gains

* LightSplitInfo too, and improve comment

* dont check features unnecessarily

* update LightSplitInfo too
```
  b52ecb16
25 Aug, 2021 1 commit

[docs] Clarify the fact that predict() on a file does not support saved... · 417ba192

James Lamb authored Aug 25, 2021


[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) (#4545)

* documentation changes

* add list of supported formats to error message

* add unit tests

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update per review comments

* make references consistent
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

417ba192

22 Aug, 2021 1 commit

factor out .size() checks in GetDataType() (#4541) · 4db10d86

James Lamb authored Aug 22, 2021



* factor out .size() checks in GetDataType()

* Update src/io/parser.cpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

4db10d86

23 Jul, 2021 1 commit
- [refactor] Use `CreateSampleIndices()` in `c_api.cpp` (#4478) · 3be611e7
  Chen Yufei authored Jul 23, 2021
```
This removes code duplication for creating sample indices.
```
  3be611e7
02 Jul, 2021 1 commit

[python-package] Create Dataset from multiple data files (#4089) · c359896e

Chen Yufei authored Jul 02, 2021

* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389



* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

c359896e

28 Jun, 2021 1 commit
- [CUDA] fix CUDA memory error by reducing block number (fixed #4315) (#4327) · 77d9529d
  Robin Dong authored Jun 28, 2021
  
  77d9529d
26 Jun, 2021 1 commit
- fix param aliases (#4387) · aab8fc18
  Nikita Titov authored Jun 26, 2021
  
  aab8fc18
25 Jun, 2021 1 commit
- sync for init score of binary objective function (#4332) · 0701a32d
  Arcs authored Jun 25, 2021
```
Co-authored-by: 未闲 <weixian.lzf@antfin.com>
```
  0701a32d
03 Jun, 2021 2 commits

Add linear leaf models to json output (fixes #4186) (#4329) · 1b5bec00

Belinda Trotta authored Jun 03, 2021



* Add linear leaf models to json output

* Add closing bracket

* Move test into test_engine.py and add asserts

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1b5bec00

skip empty bin when calculating cnt_in_bin in BinMapper::FindBin (fix #4301) (#4325) · 3dd4a3f9
shiyu1994 authored Jun 03, 2021

3dd4a3f9

26 May, 2021 1 commit
- fix GatherInfoForThresholdNumerical boundary (fix #4286) (#4322) · 346f8839
  shiyu1994 authored May 26, 2021
  
  346f8839
21 May, 2021 1 commit

fix calculation of weighted gamma loss (fixes #4174) (#4283) · 4b1b4124

Michael Mayer authored May 21, 2021

* fixed weighted gamma obj

* added unit tests

* fixing linter errors

* another linter

* set seed

* fix linter (integer seed)

4b1b4124

18 May, 2021 1 commit
- Replace division of exponential in Gamma loss (#4289) · 32fec820
  Christian Lorentzen authored May 18, 2021
  
  32fec820
10 May, 2021 1 commit
- [docs] remove extra spaces in comments and docs (#4269) · a8ee487a
  James Lamb authored May 10, 2021
  
  a8ee487a
07 May, 2021 1 commit

Precise text file parsing (#4081) · f8318088

Chen Yufei authored May 07, 2021



* New build option: USE_PRECISE_TEXT_PARSER.

Use fast_double_parser for text file parsing. For each number, fallback
to strtod in case of parse failure.

* Add benchmark for CSVParser with Atof and AtofPrecise.

* Fix lint complaint.

* Fix typo in open result error message.

* Revert "Fix lint complaint."

This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387.

* Revert "Add benchmark for CSVParser with Atof and AtofPrecise."

This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2.

* Use AtofPrecise in Common::__StringToTHelper.

* [option] precise_float_parser: precise float number parsing for text input.

* Remove USE_PRECISE_TEXT_PARSER compile option.

* test: add test for Common::AtofPrecise.

* test: remove ChunkedArrayTest with 0 length.

This triggers Log::Fatal which aborts the test program.

* fix lint, add copyright.

* Revert "test: remove ChunkedArrayTest with 0 length."

This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102.

* Use LightGBM::Common::Sign

* save precise_float_parser in model file.

* Fix error checking in AtofPrecise. Add more test cases.

* Remove test case that can't pass under macOS.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

f8318088

04 May, 2021 2 commits

fix param name (#4253) · fcd24535
Nikita Titov authored May 05, 2021
```
* fix param name

* Update gpu_tree_learner.h

* Update gbdt.h
```
fcd24535

Correct spelling (#4250) · e79716e0

Andrew Ziem authored May 04, 2021



* Correct spelling

Most changes were in comments, and there were a few changes to literals for log output.

There were no changes to variable names, function names, IDs, or functionality.

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Correct spelling

Most are code comments, but one case is a literal in a logging message.

There are a few grammar fixes too.
Co-authored-by: James Lamb <jaylamb20@gmail.com>

e79716e0

29 Apr, 2021 1 commit
- show specific error message in TCP accept/send/receive logs (#4128) · f97aa86e
  James Lamb authored Apr 28, 2021
  
  f97aa86e
27 Apr, 2021 1 commit
- Fix typo in binary file already exists error message. (#4231) · d5c2c556
  Chen Yufei authored Apr 27, 2021
  
  d5c2c556
23 Apr, 2021 1 commit
- added aliases to params (#4205) · 8b477ba3
  Nikita Titov authored Apr 23, 2021
  
  8b477ba3
22 Apr, 2021 1 commit
- when a leaf has no local data, its histogram shuold be cleared (#4185) · 0a847efe
  shiyu1994 authored Apr 22, 2021
  
  0a847efe
15 Apr, 2021 1 commit
- fix: Dataset::CreateValid init fields which saves to binary (#4177) · 98e5a210
  Chen Yufei authored Apr 16, 2021
  
  98e5a210
11 Apr, 2021 1 commit

enforce interaction constraints with monotone_constraints_method = intermediate/advanced (#4043) · 9e1d7fa1

Christoph Aymanns authored Apr 11, 2021



* add test for interaction constraints and monotone constraints

* enforce interaction constraints in RecomputeBestSplitForLeaf

* code formatting

* code formatting

* move interaction constraint test to test_engine

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

9e1d7fa1

05 Apr, 2021 1 commit
- clarify DEBUG-level log about tree depth (#4126) · 6d825cd3
  James Lamb authored Apr 05, 2021
```
* clarify DEBUG-level log about tree depth

* more places
```
  6d825cd3
24 Mar, 2021 1 commit
- fix tcp_no_deplay type by using int (#4058) · c591b77e
  htgeis authored Mar 25, 2021
  
  c591b77e
17 Mar, 2021 1 commit

Range check for DCG position discount lookup (#4069) · 4580393f

ashok-ponnuswami-msft authored Mar 17, 2021

* Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data.

* Change debug logging location so that we can print the data file name as well.

* Revert "Change debug logging location so that we can print the data file name as well."

This reverts commit 3981b34bd6e0530f89c4733e78e6b6603bf50d48.

* Add data file name to debug logging.

* Move log line to a place where it is output even when query IDs are read from a separate file.

* Also add the out-of-range check to rank metrics.

* Perform check after number of queries is initialized.

* Update

4580393f

12 Mar, 2021 1 commit
- set is_linear_ to false when it is absent from the model file (fix #3778) (#4056) · ec4bd1e0
  shiyu1994 authored Mar 13, 2021
  
  ec4bd1e0