1. 09 Aug, 2021 1 commit
  2. 03 Aug, 2021 1 commit
  3. 31 Jul, 2021 1 commit
  4. 30 Jul, 2021 1 commit
  5. 10 Jul, 2021 2 commits
  6. 07 Jul, 2021 3 commits
  7. 06 Jul, 2021 1 commit
  8. 05 Jul, 2021 2 commits
  9. 04 Jul, 2021 5 commits
  10. 02 Jul, 2021 1 commit
    • Chen Yufei's avatar
      [python-package] Create Dataset from multiple data files (#4089) · c359896e
      Chen Yufei authored
      * [python-package] create Dataset from sampled data.
      
      * [python-package] create Dataset from List[Sequence].
      
      1. Use random access for data sampling
      2. Support read data from multiple input files
      3. Read data in batch so no need to hold all data in memory
      
      * [python-package] example: create Dataset from multiple HDF5 file.
      
      * fix: revert is_class implementation for seq
      
      * fix: unwanted memory view reference for seq
      
      * fix: seq is_class accepts sklearn matrices
      
      * fix: requirements for example
      
      * fix: pycode
      
      * feat: print static code linting stage
      
      * fix: linting: avoid shell str regex conversion
      
      * code style: doc style
      
      * code style: isort
      
      * fix ci dependency: h5py on windows
      
      * [py] remove rm files in test seq
      https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623
      
      * docs(python): init_from_sample summary
      
      https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389
      
      
      
      * remove dataset dump sample data debugging code.
      
      * remove typo fix.
      
      Create separate PR for this.
      
      * fix typo in src/c_api.cpp
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * style(linting): py3 type hint for seq
      
      * test(basic): os.path style path handling
      
      * Revert "feat: print static code linting stage"
      
      This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.
      
      * feat(python): sequence on validation set
      
      * minor(python): comment
      
      * minor(python): test option hint
      
      * style(python): fix code linting
      
      * style(python): add pydoc for ref_dataset
      
      * doc(python): sequence
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * revert(python): sequence class abc
      
      * chore(python): remove rm_files
      
      * Remove useless static_assert.
      
      * refactor: test_basic test for sequence.
      
      * fix lint complaint.
      
      * remove dataset._dump_text in sequence test.
      
      * Fix reverting typo fix.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Fix type hint, code and doc style.
      
      * fix failing test_basic.
      
      * Remove TODO about keep constant in sync with cpp.
      
      * Install h5py only when running python-examples.
      
      * Fix lint complaint.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Doc fixes, remove unused params_str in __init_from_seqs.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Remove unnecessary conda install in windows ci script.
      
      * Keep param as example in dataset_from_multi_hdf5.py
      
      * Add _get_sample_count function to remove code duplication.
      
      * Use batch_size parameter in generate_hdf.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Fix after applying suggestions.
      
      * Fix test, check idx is instance of numbers.Integral.
      
      * Update python-package/lightgbm/basic.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Expose Sequence class in Python-API doc.
      
      * Handle Sequence object not having batch_size.
      
      * Fix isort lint complaint.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update docstring to mention Sequence as data input.
      
      * Remove get_one_line in test_basic.py
      
      * Make Sequence an abstract class.
      
      * Reduce number of tests for test_sequence.
      
      * Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.
      
      * empty commit to trigger ci
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.
      
      Also rename total_nrow to num_total_row in c_api.h for consistency.
      
      * Doc about Sequence in docs/Python-Intro.rst.
      
      * Fix: basic.py change LGBM_SampleIndices out_len to int32.
      
      * Add create_valid test case with Dataset from Sequence.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.
      
      * Update python-package/lightgbm/basic.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarWillian Zhang <willian@willian.email>
      Co-authored-by: default avatarWillian Z <Willian@Willian-Zhang.com>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      c359896e
  11. 29 Jun, 2021 1 commit
  12. 28 Jun, 2021 1 commit
    • Frank Fineis's avatar
      [dask] add support for eval sets and custom eval functions (#4101) · b5502d19
      Frank Fineis authored
      
      
      * es WiP, need to add eval_sample_weight and eval_group
      
      * add weight, group to dask es. WiP.
      
      * dask es reorg
      
      * Update python-package/lightgbm/dask.py
      
      _train_part model.fit args to lines
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_dask.py
      
      _train_part model.fit args to lines, pt2
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update python-package/lightgbm/dask.py
      
      _train_part model.fit args to lines pt3
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_dask.py
      
      dask_model.fit args to lines
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update python-package/lightgbm/dask.py
      
      use is instead of id()
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * applying changes to eval_set PR WiP
      
      * dask support for eval_names, eval_metric, eval_stopping_rounds
      
      * add evals_result checks and other eval_set attribute-related test checks. need to merge master - WiP
      
      * fix lint errors in test_dask.py
      
      * drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add support for eval_at for dask ranker
      
      * add eval_at to test_dask eval_set ranker tests
      
      * add back group_shape to lgbmmmodel docs, tighten tests
      
      * drop random eval weights from early stopping, probably causing training to terminate too early
      
      * add eval data templates to sklearn fit docs, add eval data docs to dask
      
      * add n_features to _create_data, eval_set tests stop w/ desirable tree counts
      
      * import alphabetically
      
      * add back get_worker for eval_set error handling
      
      * test_dask argmin typo
      
      * push forgotten eval_names bugfix
      
      * eval_stopping_rounds -> early_stopping_rounds, fix failing non-es test
      
      * change default eval_at to tuple 1-5
      
      * re-drop get_worker
      
      * drop early stopping support from eval_set commits, move eval_set worker check prior to client.submit
      
      * add eval_class_weight and eval_init_score to lightgbm/dask, WiP
      
      * clean up eval_set tests, allow user to specify fewer eval_names, clswghts than eval_sets
      
      * remove redundant backslash
      
      * lint fixes
      
      * fix eval_at, eval_metric duplication, let eval_at be Iterable not just Tuple
      
      * use all data_outputs for test_eval_set tests
      
      * undo newlines from first pr
      
      * add custom_eval_metric test, correct issue with eval_at and metric names
      
      * move _constant_metric outside of test
      
      * dataset reference names instead of __strings__
      
      * add padding to eval_set parts makes each part has same len(eval_set)
      
      * eval set code clean up
      
      * revert n_evals to be max len eval_set across all parts on worker
      
      * pylint errors in _DatasetNames
      
      * more pylint fixes
      
      * pylinting...
      
      * add by pytest.mark, mistakenly deleted during merge conflict resolution
      
      * address code review comments
      
      * add _pad_eval_names to handle nondeterministic evals_result_ valid set names
      
      * change not evaluated evals_result_ test criteria
      
      * address fit eval docs issues, switch _DatasetNames to Enum
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/dask.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * update eval_metrics, eval_at dask fit docstr to match sklearn, make tests reflect that l2 (rmse), logloss in evals_result_ by default
      
      * address eval_set dict keys naming in docstr and training eval_set naming issue
      
      * in test_dask check for obj-default metric names in eval_results, remove check for training key
      
      * lint fixes for _pad_eval_names
      
      * remove unnecessary breaklinen in _pad_eval_names docstr
      
      * use Enum.member syntax not Enum.member.name
      
      * remove str from supported eval_at types
      
      * add whitespace and remove DaskDataframes mention from eval_ param docstrs in _train
      
      * remove "of shape = [n_samples]" from group_shape docs
      
      * add eval_at base_doc in DaskLGBMRanker.fit
      
      * remove excess paren from eval_names docs in _train
      
      * make requested changes to test_dask.py
      
      * remove Optional() wrapper on eval_at
      
      * add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__
      
      * fix ordering of .sklearn imports to attempt lint fix
      
      * dask custom eval note to f-string pt1
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * dask custom eval note to f-string pt 2
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * dask custom eval note to f-string pt 3
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      b5502d19
  13. 27 Jun, 2021 2 commits
  14. 26 Jun, 2021 2 commits
  15. 18 Jun, 2021 2 commits
  16. 15 Jun, 2021 1 commit
  17. 12 Jun, 2021 1 commit
  18. 09 Jun, 2021 2 commits
  19. 07 Jun, 2021 1 commit
  20. 03 Jun, 2021 1 commit
  21. 21 May, 2021 3 commits
  22. 20 May, 2021 1 commit
  23. 16 May, 2021 1 commit
  24. 07 May, 2021 1 commit
    • Chen Yufei's avatar
      Precise text file parsing (#4081) · f8318088
      Chen Yufei authored
      
      
      * New build option: USE_PRECISE_TEXT_PARSER.
      
      Use fast_double_parser for text file parsing. For each number, fallback
      to strtod in case of parse failure.
      
      * Add benchmark for CSVParser with Atof and AtofPrecise.
      
      * Fix lint complaint.
      
      * Fix typo in open result error message.
      
      * Revert "Fix lint complaint."
      
      This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387.
      
      * Revert "Add benchmark for CSVParser with Atof and AtofPrecise."
      
      This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2.
      
      * Use AtofPrecise in Common::__StringToTHelper.
      
      * [option] precise_float_parser: precise float number parsing for text input.
      
      * Remove USE_PRECISE_TEXT_PARSER compile option.
      
      * test: add test for Common::AtofPrecise.
      
      * test: remove ChunkedArrayTest with 0 length.
      
      This triggers Log::Fatal which aborts the test program.
      
      * fix lint, add copyright.
      
      * Revert "test: remove ChunkedArrayTest with 0 length."
      
      This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102.
      
      * Use LightGBM::Common::Sign
      
      * save precise_float_parser in model file.
      
      * Fix error checking in AtofPrecise. Add more test cases.
      
      * Remove test case that can't pass under macOS.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      f8318088
  25. 04 May, 2021 1 commit
  26. 28 Apr, 2021 1 commit