Commits · b60068c810cbcac9cf4e1a8e678d8d531c40eb72 · tianlh / LightGBM-DCU

07 Dec, 2023 1 commit
- [python-package] take shallow copy of dataframe in predict (fixes #6195) (#6218) · e7979852
  José Morales authored Dec 07, 2023
  
  e7979852
07 Nov, 2023 1 commit
- [python-package] fix access to Dataset metadata in scikit-learn custom metrics... · aeafccfb
  James Lamb authored Nov 07, 2023
```
[python-package] fix access to Dataset metadata in scikit-learn custom metrics and objectives (#6108)
```
  aeafccfb
06 Sep, 2023 1 commit
- [python-package] simplify processing of pandas data (#6066) · ee511201
  James Lamb authored Sep 06, 2023
  
  ee511201
29 Jun, 2023 1 commit
- [python-package] make Booster and Dataset 'handle' attributes private (fixes #5313) (#5947) · b8cc8738
  James Lamb authored Jun 29, 2023
  
  b8cc8738
16 May, 2023 1 commit
- [ci] [python-package] use ruff, enforce flake8-bugbear and flake8-comprehensions checks (#5871) · d47006f4
  James Lamb authored May 16, 2023
  
  d47006f4
19 Apr, 2023 1 commit
- [python-package] remove default arguments in internal functions (#5834) · fd921d53
  James Lamb authored Apr 18, 2023
  
  fd921d53
15 Feb, 2023 1 commit
- [python-package] add Booster.set_leaf_output method (#5712) · 29796eee
  José Morales authored Feb 15, 2023
  
  29796eee
01 Feb, 2023 1 commit

[CUDA] consolidate CUDA versions (#5677) · 4f47547c

James Lamb authored Jan 31, 2023



* [ci] speed up if-else, swig, and lint conda setup

* add 'source activate'

* python constraint

* start removing cuda v1

* comment out CI

* remove more references

* revert some unnecessaary changes

* revert a few more mistakes

* revert another change that ignored params

* sigh

* remove CUDATreeLearner

* fix tests, docs

* fix quoting in setup.py

* restore all CI

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Apply suggestions from code review

* completely remove cuda_exp, update docs

---------
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

4f47547c

29 Dec, 2022 1 commit
- fix feature index in Dataset::AddFeaturesFrom (fixes #5410) (#5650) · 51edbda7
  James Lamb authored Dec 29, 2022
  
  51edbda7
04 Nov, 2022 1 commit
- [python-package] prefix several internal functions with _ (#5545) · 06a1ee25
  Madnex authored Nov 04, 2022
  
  06a1ee25
28 Aug, 2022 1 commit
- include parameters from reference dataset on subset (fixes #5402) (#5416) · 5079de4a
  José Morales authored Aug 28, 2022
```
* include parameters from reference dataset on copy

* lint

* set non-default parameters
```
  5079de4a
30 Jul, 2022 1 commit

reproducible parameter alias resolution for wrappers (fixes #5304) (#5338) · 83627ff0

José Morales authored Jul 30, 2022

* dump sorted parameter aliases

* update lgb.check.wrapper_param

* update _choose_param_value to look like lgb.check.wrapper_param

* apply suggestions from review

* reduce diff

* move DumpAliases to config

* remove unnecessary check

* restore parameter check

83627ff0

19 Jun, 2022 1 commit

[python] preserve None in `_choose_param_value()` (#5289) · 70654048

James Lamb authored Jun 19, 2022



* [python] preserve None in _choose_param_value()

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

70654048

05 Jun, 2022 2 commits
- [tests][python] Make test that checks original pandas data isn't modified more strict (#5267) · 27d9ad2e
  Nikita Titov authored Jun 06, 2022
```
* Update test_basic.py

* Address review comment
```
  27d9ad2e
- [python-package] make a shallow copy on dataframe rename (fixes #4596) (#5254) · 65b3db1c
  José Morales authored Jun 04, 2022
```
* dont copy dataframe on rename

* test with feature_name and 'auto'
```
  65b3db1c
22 May, 2022 1 commit
- [python-package] make a shallow copy when replacing categorical features with... · c000b8cc
  José Morales authored May 21, 2022
```
[python-package] make a shallow copy when replacing categorical features with codes (fixes #4596) (#5225)
```
  c000b8cc
17 May, 2022 1 commit

[python-package][R-package] allow using feature names when retrieving number of bins (#5116) · 5b664b67

José Morales authored May 16, 2022

* allow using feature names when retrieving number of bins

* unname vector

* use default feature names when not defined

* lint

* apply suggestions

* remove extra comma

* add test with categorical feature

* make feature names sync more transparent

5b664b67

30 Apr, 2022 1 commit
- [c-api] check number of features when retrieving number of bins (#5183) · f53fa691
  José Morales authored Apr 30, 2022
```
* check number of features when retrieving number of bins

* check for negative values

* lint
```
  f53fa691
22 Apr, 2022 1 commit

[python-package] remove 'fobj' in favor of passing custom objective function... · 416ecd5a

Miguel Trejo Marrufo authored Apr 21, 2022


[python-package] remove 'fobj' in favor of passing custom objective function in params (fixes #3244) (#5052)

* feat: support custom metrics in params

* feat: support objective in params

* test: custom objective and metric

* fix: imports are incorrectly sorted

* feat: convert eval metrics str and set to list

* feat: convert single callable eval_metric to list

* test: single callable objective in params
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* feat: callable fobj in basic cv function
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: cv support objective callable
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* fix: assert in cv_res
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* docs: objective callable in params
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* recover test_boost_from_average_with_single_leaf_trees
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* linters fail
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* remove metrics helper functions
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* feat: choose objective through _choose_param_values
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: test objective through _choose_param_values
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: test objective is callabe in train
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: parametrize choose_param_value with objective aliases
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: cv booster metric is none
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* fix: if string and callable choose callable
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test train uses custom objective metrics
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: cv uses custom objective metrics
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* refactor: remove fobj parameter in train and cv
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* refactor: objective through params in sklearn API
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* custom objective function in advanced_example
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* fix whitespackes lint

* objective is none not a particular case for predict method
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* replace scipy.expit with custom implementation
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: set num_boost_round value to 20
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* fix: custom objective default_value is none
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* refactor: remove self._fobj
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* custom_objective default value is None
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* refactor: variables name reference dummy_obj
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* linter errors

* fix: process objective parameter when calling predict
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* linter errors

* fix: objective is None during predict call
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

416ecd5a

23 Mar, 2022 1 commit

[CUDA] New CUDA version Part 1 (#4630) · 6b56a90c

shiyu1994 authored Mar 23, 2022



* new cuda framework

* add histogram construction kernel

* before removing multi-gpu

* new cuda framework

* tree learner cuda kernels

* single tree framework ready

* single tree training framework

* remove comments

* boosting with cuda

* optimize for best split find

* data split

* move boosting into cuda

* parallel synchronize best split point

* merge split data kernels

* before code refactor

* use tasks instead of features as units for split finding

* refactor cuda best split finder

* fix configuration error with small leaves in data split

* skip histogram construction of too small leaf

* skip split finding of invalid leaves

stop when no leaf to split

* support row wise with CUDA

* copy data for split by column

* copy data from host to CPU by column for data partition

* add synchronize best splits for one leaf from multiple blocks

* partition dense row data

* fix sync best split from task blocks

* add support for sparse row wise for CUDA

* remove useless code

* add l2 regression objective

* sparse multi value bin enabled for CUDA

* fix cuda ranking objective

* support for number of items <= 2048 per query

* speedup histogram construction by interleaving global memory access

* split optimization

* add cuda tree predictor

* remove comma

* refactor objective and score updater

* before use struct

* use structure for split information

* use structure for leaf splits

* return CUDASplitInfo directly after finding best split

* split with CUDATree directly

* use cuda row data in cuda histogram constructor

* clean src/treelearner/cuda

* gather shared cuda device functions

* put shared CUDA functions into header file

* change smaller leaf from <= back to < for consistent result with CPU

* add tree predictor

* remove useless cuda_tree_predictor

* predict on CUDA with pipeline

* add global sort algorithms

* add global argsort for queries with many items in ranking tasks

* remove limitation of maximum number of items per query in ranking

* add cuda metrics

* fix CUDA AUC

* remove debug code

* add regression metrics

* remove useless file

* don't use mask in shuffle reduce

* add more regression objectives

* fix cuda mape loss

add cuda xentropy loss

* use template for different versions of BitonicArgSortDevice

* add multiclass metrics

* add ndcg metric

* fix cross entropy objectives and metrics

* fix cross entropy and ndcg metrics

* add support for customized objective in CUDA

* complete multiclass ova for CUDA

* separate cuda tree learner

* use shuffle based prefix sum

* clean up cuda_algorithms.hpp

* add copy subset on CUDA

* add bagging for CUDA

* clean up code

* copy gradients from host to device

* support bagging without using subset

* add support of bagging with subset for CUDAColumnData

* add support of bagging with subset for dense CUDARowData

* refactor copy sparse subrow

* use copy subset for column subset

* add reset train data and reset config for CUDA tree learner

add deconstructors for cuda tree learner

* add USE_CUDA ifdef to cuda tree learner files

* check that dataset doesn't contain CUDA tree learner

* remove printf debug information

* use full new cuda tree learner only when using single GPU

* disable all CUDA code when using CPU version

* recover main.cpp

* add cpp files for multi value bins

* update LightGBM.vcxproj

* update LightGBM.vcxproj

fix lint errors

* fix lint errors

* fix lint errors

* update Makevars

fix lint errors

* fix the case with 0 feature and 0 bin

fix split finding for invalid leaves

create cuda column data when loaded from bin file

* fix lint errors

hide GetRowWiseData when cuda is not used

* recover default device type to cpu

* fix na_as_missing case

fix cuda feature meta information

* fix UpdateDataIndexToLeafIndexKernel

* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore

* add refit by tree for cuda tree learner

* fix test_refit in test_engine.py

* create set of large bin partitions in CUDARowData

* add histogram construction for columns with a large number of bins

* add find best split for categorical features on CUDA

* add bitvectors for categorical split

* cuda data partition split for categorical features

* fix split tree with categorical feature

* fix categorical feature splits

* refactor cuda_data_partition.cu with multi-level templates

* refactor CUDABestSplitFinder by grouping task information into struct

* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder

* fix misuse of reference

* remove useless changes

* add support for path smoothing

* virtual destructor for LightGBM::Tree

* fix overlapped cat threshold in best split infos

* reset histogram pointers in data partition and spllit finder in ResetConfig

* comment useless parameter

* fix reverse case when na is missing and default bin is zero

* fix mfb_is_na and mfb_is_zero and is_single_feature_column

* remove debug log

* fix cat_l2 when one-hot

fix gradient copy when data subset is used

* switch shared histogram size according to CUDA version

* gpu_use_dp=true when cuda test

* revert modification in config.h

* fix setting of gpu_use_dp=true in .ci/test.sh

* fix linter errors

* fix linter error

remove useless change

* recover main.cpp

* separate cuda_exp and cuda

* fix ci bash scripts

add description for cuda_exp

* add USE_CUDA_EXP flag

* switch off USE_CUDA_EXP

* revert changes in python-packages

* more careful separation for USE_CUDA_EXP

* fix CUDARowData::DivideCUDAFeatureGroups

fix set fields for cuda metadata

* revert config.h

* fix test settings for cuda experimental version

* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version

* fix lint issue by adding a blank line

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* merge cuda.yml and cuda_exp.yml

* update python version in cuda.yml

* remove cuda_exp.yml

* remove unrelated changes

* fix compilation warnings

fix cuda exp ci task name

* recover task

* use multi-level template in histogram construction

check split only in debug mode

* ignore NVCC related lines in parameter_generator.py

* update job name for CUDA tests

* apply review suggestions

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update header

* remove useless TODOs

* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062

* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only

* fix include order

* fix include order

* remove extra space

* address review comments

* add warning when cuda_exp is used together with deterministic

* add comment about gpu_use_dp in .ci/test.sh

* revert changing order of included headers
Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

6b56a90c

15 Mar, 2022 1 commit

[c-api][python-package][R-package] expose feature num bin (#5048) · d10372e2

José Morales authored Mar 14, 2022



* expose FeatureNumBin in C api

* parametrize min_data_in_bin and add test with max_bin_by_feature

* include feature_num_bin in R package

* add suggestion from review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update error message and lint

* lint

* add call method

* minor improvements in tests

* add suggestions from review

* lint

* rename argument to feature in python and r packages
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

d10372e2

01 Mar, 2022 1 commit

[tests][python] move tests that use `train()` function defined in `engine.py`... · 01568cf5

Nikita Titov authored Mar 01, 2022

[tests][python] move tests that use `train()` function defined in `engine.py` from `test_basic.py` to `test_engine.py` (#5034)

* Update test_basic.py

* Update test_engine.py

* Update test_engine.py

01568cf5

24 Feb, 2022 1 commit

[python-package] add support for pandas nullable types (fixes #4173) (#4927) · f1856956

José Morales authored Feb 23, 2022



* map nullable dtypes to regular float dtypes

* cast x3 to float after introducing missing values

* add test for regular dtypes

* use .astype and then values. update nullable_dtypes test and include test for regular numpy dtypes

* more specific allowed dtypes. test no copy when single float dtype df

* use np.find_common_type. set np.float128 to None when it isn't supported

* set default as type(None)

* move tests that use lgb.train to test_engine

* include np.float32 when finding common dtype

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add linebreak
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

f1856956

23 Feb, 2022 1 commit

[python-package] use 2d collections for predictions, grads and hess in... · d670a4d6

José Morales authored Feb 22, 2022

[python-package] use 2d collections for predictions, grads and hess in multiclass custom objective (#4925)

* reshape predictions, grad and hess in multiclass custom objective

* add sklearn test. move custom obj to utils. docs for numpy

* use num_model_per_iteration to get num_classes

* update docs and dask multiclass custom objective test

* move reshaping to __inner_predict. add test for feval

* add missing note. remove extra line

d670a4d6

30 Dec, 2021 1 commit

[python] raise an informative error instead of segfaulting when custom... · af5b40e1

Yaqub Alwan authored Dec 30, 2021


[python] raise an informative error instead of segfaulting when custom objective produces incorrect output (#4815)

* fix for bad grads causing segfault

* adjust checking criteria to properly reflect reality of multi-class classifiers

* fix styling

* Line break before operator

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add a note to the C-API docs

* rearrange text s;ightly

* add some tests to python package

* Update include/LightGBM/c_api.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* PR comments

* match argument is a regex and our expression has brackets ..

* rework tests

* isorting imports

* updating test to relfect that the python APi does not take pres/labels as a fobj function
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

af5b40e1

03 Dec, 2021 1 commit

Add C API function that returns all parameter names with their aliases (#4829) · cf38071b

Nikita Titov authored Dec 03, 2021



* add C API function that returns all param names with aliases

* add C API function that returns all param names with aliases

* add R code

* test R code

* remove debug CI

* fix R lint

* refactor

* run CI

* fix R

* fix

* revert CI checks

* revert changes in docs

* Try to make function `const`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* add `const` in cpp file

* address review comments and sync with `master`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

cf38071b

16 Nov, 2021 1 commit

Add customized parser support (#4782) · b0137deb

chjinche authored Nov 16, 2021

* add customized parser support

* fix typo of parser_config_file description

* make delimiter as parameter of JoinedLines

b0137deb

07 Oct, 2021 1 commit
- [tests][python-package] refactor list_to_1d_numpy test to run without pandas installed (#4639) · 29857c8a
  José Morales authored Oct 07, 2021
```
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
```
  29857c8a
17 Sep, 2021 1 commit

[python-package] Support 2d collections as input for `init_score` in... · f1f5ba15

José Morales authored Sep 17, 2021


[python-package] Support 2d collections as input for `init_score` in multiclass classification task (#4150)

* initial implementation of init_score for multiclass classification

* check for 1d or 2d collection in init_score

* remove dataset import

* initial comments

* update dask test and docstrings

* update docstrings

* move logic to set_field. reshape back on get_field

* add type hints and update docstrings for dask. fix Dataset.set_field

* revert wrong docstrings and type hints

* add extra comma for consistency

* prefix private functions with underscore

add type hints to new functions

make commas consistent in dask and basic

* add missing spaces after type hint

* remove shape condition for dataframe in is_2d_collection
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>

f1f5ba15

31 Jul, 2021 1 commit
- [python][tests] refactor tests with Sequence input (#4495) · 661bde10
  Nikita Titov authored Jul 31, 2021
  
  661bde10
30 Jul, 2021 1 commit

[python] support Dataset.get_data for Sequence input. (#4472) · 1d21d1ad

Chen Yufei authored Jul 31, 2021



* [python] support Dataset.get_data for Sequence input.

* Tweaks according to review comments.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Add test cases.

* fix import order in test_basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1d21d1ad

07 Jul, 2021 1 commit
- [python] allow to pass some params as pathlib.Path objects (#4440) · 90342e92
  Nikita Titov authored Jul 07, 2021
```
* allow to pass some params as pathlib.Path objects

* fix lint

* improve indentation
```
  90342e92
05 Jul, 2021 1 commit

[python] minor refactoring of Python code (#4442) · 7eac5a63

Nikita Titov authored Jul 05, 2021

* Update test_sklearn.py

* Update test_basic.py

* Update dask.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update callback.py

7eac5a63

04 Jul, 2021 2 commits
- [tests] fix deprecation numpy warning (#4439) · 29052c5d
  Nikita Titov authored Jul 05, 2021
  
  29052c5d
- [python] migrate to pathlib in python tests (#4435) · cff80442
  Nikita Titov authored Jul 04, 2021
  
  cff80442
02 Jul, 2021 1 commit

[python-package] Create Dataset from multiple data files (#4089) · c359896e

Chen Yufei authored Jul 02, 2021

* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389



* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

c359896e

21 May, 2021 2 commits
- [python] improving the syntax of the fstring in the file :... · da3465cb
  sayantan sadhu authored May 21, 2021
```
[python] improving the syntax of the fstring in the file : tests/python_package_test/test_basic.py (#4312)
```
  da3465cb
- [python] handle arbitrary length feature names in Python-package (#4293) · 237ac299
  Nikita Titov authored May 21, 2021
```
* handle arbitrary length feature names in Python-package

* added tests
```
  237ac299
24 Feb, 2021 1 commit

[dask][python-package] include support for column array as label (#3943) · 5dacd603

jmoralez authored Feb 24, 2021

* include support for column array as label

* remove nested ifs

* fix linting errors

* include tests for sklearn regressors

* include docstring for numpy_1d_array_to_dtype

* include . at end of docstring

* remove pandas import and test for regression, classification and ranking

* check predictions of sklearn models as well

* test training only in dask. drop pandas series tests

* use PANDAS_INSTALLED and pd_Series

* inline imports

* use col array in fit for test_dask

* include review comments

5dacd603

16 Feb, 2021 1 commit
- [ci][python] apply isort to tests/python_package_test/test_basic.py #3958 (#3977) · 9445b2ca
  Zhuyi Xue authored Feb 15, 2021
  
  9445b2ca