1. 14 Aug, 2021 1 commit
  2. 26 Jul, 2021 1 commit
  3. 25 Jul, 2021 1 commit
  4. 10 Jul, 2021 1 commit
  5. 09 Jul, 2021 1 commit
  6. 02 Jul, 2021 2 commits
    • Nikita Titov's avatar
      02ca158f
    • Chen Yufei's avatar
      [python-package] Create Dataset from multiple data files (#4089) · c359896e
      Chen Yufei authored
      * [python-package] create Dataset from sampled data.
      
      * [python-package] create Dataset from List[Sequence].
      
      1. Use random access for data sampling
      2. Support read data from multiple input files
      3. Read data in batch so no need to hold all data in memory
      
      * [python-package] example: create Dataset from multiple HDF5 file.
      
      * fix: revert is_class implementation for seq
      
      * fix: unwanted memory view reference for seq
      
      * fix: seq is_class accepts sklearn matrices
      
      * fix: requirements for example
      
      * fix: pycode
      
      * feat: print static code linting stage
      
      * fix: linting: avoid shell str regex conversion
      
      * code style: doc style
      
      * code style: isort
      
      * fix ci dependency: h5py on windows
      
      * [py] remove rm files in test seq
      https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623
      
      * docs(python): init_from_sample summary
      
      https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389
      
      
      
      * remove dataset dump sample data debugging code.
      
      * remove typo fix.
      
      Create separate PR for this.
      
      * fix typo in src/c_api.cpp
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * style(linting): py3 type hint for seq
      
      * test(basic): os.path style path handling
      
      * Revert "feat: print static code linting stage"
      
      This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.
      
      * feat(python): sequence on validation set
      
      * minor(python): comment
      
      * minor(python): test option hint
      
      * style(python): fix code linting
      
      * style(python): add pydoc for ref_dataset
      
      * doc(python): sequence
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * revert(python): sequence class abc
      
      * chore(python): remove rm_files
      
      * Remove useless static_assert.
      
      * refactor: test_basic test for sequence.
      
      * fix lint complaint.
      
      * remove dataset._dump_text in sequence test.
      
      * Fix reverting typo fix.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Fix type hint, code and doc style.
      
      * fix failing test_basic.
      
      * Remove TODO about keep constant in sync with cpp.
      
      * Install h5py only when running python-examples.
      
      * Fix lint complaint.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Doc fixes, remove unused params_str in __init_from_seqs.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Remove unnecessary conda install in windows ci script.
      
      * Keep param as example in dataset_from_multi_hdf5.py
      
      * Add _get_sample_count function to remove code duplication.
      
      * Use batch_size parameter in generate_hdf.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Fix after applying suggestions.
      
      * Fix test, check idx is instance of numbers.Integral.
      
      * Update python-package/lightgbm/basic.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Expose Sequence class in Python-API doc.
      
      * Handle Sequence object not having batch_size.
      
      * Fix isort lint complaint.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update docstring to mention Sequence as data input.
      
      * Remove get_one_line in test_basic.py
      
      * Make Sequence an abstract class.
      
      * Reduce number of tests for test_sequence.
      
      * Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.
      
      * empty commit to trigger ci
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.
      
      Also rename total_nrow to num_total_row in c_api.h for consistency.
      
      * Doc about Sequence in docs/Python-Intro.rst.
      
      * Fix: basic.py change LGBM_SampleIndices out_len to int32.
      
      * Add create_valid test case with Dataset from Sequence.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.
      
      * Update python-package/lightgbm/basic.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarWillian Zhang <willian@willian.email>
      Co-authored-by: default avatarWillian Z <Willian@Willian-Zhang.com>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      c359896e
  7. 26 Jun, 2021 1 commit
  8. 24 Jun, 2021 1 commit
  9. 14 Jun, 2021 1 commit
  10. 11 Jun, 2021 1 commit
  11. 09 Jun, 2021 1 commit
  12. 04 Jun, 2021 1 commit
  13. 20 May, 2021 1 commit
  14. 11 May, 2021 1 commit
  15. 09 May, 2021 1 commit
  16. 07 May, 2021 1 commit
    • Chen Yufei's avatar
      Precise text file parsing (#4081) · f8318088
      Chen Yufei authored
      
      
      * New build option: USE_PRECISE_TEXT_PARSER.
      
      Use fast_double_parser for text file parsing. For each number, fallback
      to strtod in case of parse failure.
      
      * Add benchmark for CSVParser with Atof and AtofPrecise.
      
      * Fix lint complaint.
      
      * Fix typo in open result error message.
      
      * Revert "Fix lint complaint."
      
      This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387.
      
      * Revert "Add benchmark for CSVParser with Atof and AtofPrecise."
      
      This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2.
      
      * Use AtofPrecise in Common::__StringToTHelper.
      
      * [option] precise_float_parser: precise float number parsing for text input.
      
      * Remove USE_PRECISE_TEXT_PARSER compile option.
      
      * test: add test for Common::AtofPrecise.
      
      * test: remove ChunkedArrayTest with 0 length.
      
      This triggers Log::Fatal which aborts the test program.
      
      * fix lint, add copyright.
      
      * Revert "test: remove ChunkedArrayTest with 0 length."
      
      This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102.
      
      * Use LightGBM::Common::Sign
      
      * save precise_float_parser in model file.
      
      * Fix error checking in AtofPrecise. Add more test cases.
      
      * Remove test case that can't pass under macOS.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      f8318088
  17. 04 May, 2021 1 commit
  18. 25 Apr, 2021 2 commits
  19. 23 Apr, 2021 1 commit
  20. 17 Apr, 2021 2 commits
  21. 16 Apr, 2021 2 commits
  22. 09 Apr, 2021 1 commit
  23. 05 Apr, 2021 3 commits
  24. 28 Mar, 2021 1 commit
  25. 27 Mar, 2021 1 commit
  26. 25 Mar, 2021 1 commit
    • Akshita Dixit's avatar
      [docs] Add alt text on images (related to #4036) (#4038) · 6ad3e6e9
      Akshita Dixit authored
      
      
      * [docs]Add alt text on images
      
      * Update docs/GPU-Windows.rst
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update docs/GPU-Windows.rst
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Merge main branch commit updates (#1)
      
      * [docs] Add alt text to image in Parameters-Tuning.rst (#4035)
      
      * [docs] Add alt text to image in Parameters-Tuning.rst
      
      Add alt text to Leaf-wise growth image, as part of #4028
      
      * Update docs/Parameters-Tuning.rst
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * [ci] [R-package] upgrade to R 4.0.4 in CI (#4042)
      
      * [docs] update description of deterministic parameter (#4027)
      
      * update description of deterministic parameter to require using with force_row_wise or force_col_wise
      
      * Update include/LightGBM/config.h
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * update docs
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * [dask] Include support for init_score (#3950)
      
      * include support for init_score
      
      * use dataframe from init_score and test difference with and without init_score in local model
      
      * revert refactoring
      
      * initial docs. test between distributed models with and without init_score
      
      * remove ranker from tests
      
      * test value for root node and change docs
      
      * comma
      
      * re-include parametrize
      
      * fix incorrect merge
      
      * use single init_score and the booster_ attribute
      
      * use np.float64 instead of float
      
      * [ci] ignore untitle Jupyter notebooks in .gitignore (#4047)
      
      * [ci] prevent getting incompatible dask and distributed versions (#4054)
      
      * [ci] prevent getting incompatible dask and distributed versions
      
      * Update .ci/test.sh
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * empty commit
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * [ci] fix R CMD CHECK note about example timings (fixes #4049) (#4055)
      
      * [ci] fix R CMD CHECK note about example timings (fixes #4049)
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * empty commit
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * [ci] add CMake + R 3.6 test back (fixes #3469) (#4053)
      
      * [ci] add CMake + R 3.6 test back (fixes #3469)
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update .ci/test_r_package_windows.ps1
      
      * -Wait and remove rtools40
      
      * empty commit
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * [dask] include multiclass-classification task in tests (#4048)
      
      * include multiclass-classification task and task_to_model_factory dicts
      
      * define centers coordinates. flatten init_scores within each partition for multiclass-classification
      
      * include issue comment and fix linting error
      
      * Update index.rst (#4029)
      
      Add alt text to logo image
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * [dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059)
      
      * [dask] raise more informative error for duplicates in 'machines'
      
      * uncomment
      
      * avoid test failure
      
      * Revert "avoid test failure"
      
      This reverts commit 9442bdf00f193a19a923dc0deb46b7822cb6f601.
      
      * [dask] add tutorial documentation (fixes #3814, fixes #3838) (#4030)
      
      * [dask] add tutorial documentation (fixes #3814, fixes #3838)
      
      * add notes on saving the model
      
      * quick start examples
      
      * add examples
      
      * fix timeouts in examples
      
      * remove notebook
      
      * fill out prediction section
      
      * table of contents
      
      * add line back
      
      * linting
      
      * isort
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * move examples under python-guide
      
      * remove unused pickle import
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * set 'pending' commit status for R Solaris optional workflow (#4061)
      
      * [docs] add Yu Shi to repo maintainers (#4060)
      
      * Update FAQ.rst
      
      * Update CODEOWNERS
      
      * set is_linear_ to false when it is absent from the model file (fix #3778) (#4056)
      
      * Add CMake option to enable sanitizers and build gtest (#3555)
      
      * Add CMake option to enable sanitizer
      
      * Set up gtest
      
      * Address reviewer's feedback
      
      * Address reviewer's feedback
      
      * Update CMakeLists.txt
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * added type hint (#4070)
      
      * [ci] run Dask examples on CI (#4064)
      
      * Update Parallel-Learning-Guide.rst
      
      * Update test.sh
      
      * fix path
      
      * address review comments
      
      * [python-package] add type hints on Booster.set_network() (#4068)
      
      * [python-package] add type hints on Booster.set_network()
      
      * change behavior
      
      * [python-package] Some mypy fixes (#3916)
      
      * Some mypy fixes
      
      * address James' comments
      
      * Re-introduce pass in empty classes
      
      * Update compat.py
      
      Remove extra lines
      
      * [dask] [ci] fix flaky network-setup test (#4071)
      
      * [tests][dask] simplify code in Dask tests (#4075)
      
      * simplify Dask tests code
      
      * enable CI
      
      * disable CI
      
      * Revert "[ci] prevent getting incompatible dask and distributed versions (#4054)" (#4076)
      
      This reverts commit 4e9c9768
      
      .
      
      * Fix parsing of non-finite values (#3942)
      
      * Fix index out-of-range exception generated by BaggingHelper on small datasets.
      
      Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.
      
      * Update goss.hpp
      
      * Update goss.hpp
      
      * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)
      
      * Fix incorrect upstream merge
      
      * Add link to LightGBM.NET
      
      * Fix indenting to 2 spaces
      
      * Dummy edit to trigger CI
      
      * Dummy edit to trigger CI
      
      * remove duplicate functions from merge
      
      * Fix parsing of non-finite values.  Current implementation silently returns zero when input string is "inf", "-inf", or "nan" when compiled with VS2017, so instead just explicitly check for these values and fail if there is no match.  No attempt to optimise string allocations in this implementation since it is usually rarely invoked.
      
      * Dummy commit to trigger CI
      
      * Also handle -nan in double parsing method
      
      * Update include/LightGBM/utils/common.h
      
      Remove trailing whitespace to pass linting tests
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarmatthew-peacock <matthew.peacock@whiteoakam.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * [dask] remove unused imports from typing (#4079)
      
      * Range check for DCG position discount lookup (#4069)
      
      * Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data.
      
      * Change debug logging location so that we can print the data file name as well.
      
      * Revert "Change debug logging location so that we can print the data file name as well."
      
      This reverts commit 3981b34bd6e0530f89c4733e78e6b6603bf50d48.
      
      * Add data file name to debug logging.
      
      * Move log line to a place where it is output even when query IDs are read from a separate file.
      
      * Also add the out-of-range check to rank metrics.
      
      * Perform check after number of queries is initialized.
      
      * Update
      
      * [ci] upgrade R CI scripts to work on Ubuntu 20.04 (#4084)
      
      * [ci] install additional LaTeX packages in R CI jobs
      
      * update autoconf version
      
      * bump upper limit on package size to 100
      
      * [SWIG] Add streaming data support + cpp tests (#3997)
      
      * [feature] Add ChunkedArray to SWIG
      
      * Add ChunkedArray
      * Add ChunkedArray_API_extensions.i
      * Add SWIG class wrappers
      
      * Address some review comments
      
      * Fix linting issues
      
      * Move test to tests/test_ChunkedArray_manually.cpp
      
      * Add test note
      
      * Move ChunkedArray to include/LightGBM/utils/
      
      * Declare more explicit types of ChunkedArray in the SWIG API.
      
      * Port ChunkedArray tests to googletest
      
      * Please C++ linter
      
      * Address StrikerRUS' review comments
      
      * Update SWIG doc & disable ChunkedArray<int64_t>
      
      * Use CHECK_EQ instead of assert
      
      * Change include order (linting)
      
      * Rename ChunkedArray -> chunked_array files
      
      * Change header guards
      
      * Address last comments from StrikerRUS
      
      * store all CMake files in one place (#4087)
      
      * v3.2.0 release (#3872)
      
      * Update VERSION.txt
      
      * update appveyor.yml and configure
      
      * fix Appveyor builds
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarStrikerRUS <nekit94-12@hotmail.com>
      
      * [ci] Bump version for development (#4094)
      
      * Update .appveyor.yml
      
      * Update cran-comments.md
      
      * Update VERSION.txt
      
      * update configure
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * [ci] fix flaky Azure Pipelines jobs (#4095)
      
      * Update test.sh
      
      * Update setup.sh
      
      * Update .vsts-ci.yml
      
      * Update test.sh
      
      * Update setup.sh
      
      * Update .vsts-ci.yml
      
      * Update setup.sh
      
      * Update setup.sh
      Co-authored-by: default avatarSubham Agrawal <34346812+subhamagrawal7@users.noreply.github.com>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarjmoralez <jmoralz92@gmail.com>
      Co-authored-by: default avatarmarcelonieva7 <72712805+marcelonieva7@users.noreply.github.com>
      Co-authored-by: default avatarPhilip Hyunsu Cho <chohyu01@cs.washington.edu>
      Co-authored-by: default avatarDeddy Jobson <dedjob@hotmail.com>
      Co-authored-by: default avatarAlberto Ferreira <AlbertoEAF@users.noreply.github.com>
      Co-authored-by: default avatarmjmckp <mjmckp@users.noreply.github.com>
      Co-authored-by: default avatarmatthew-peacock <matthew.peacock@whiteoakam.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      Co-authored-by: default avatarashok-ponnuswami-msft <57648631+ashok-ponnuswami-msft@users.noreply.github.com>
      Co-authored-by: default avatarStrikerRUS <nekit94-12@hotmail.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarSubham Agrawal <34346812+subhamagrawal7@users.noreply.github.com>
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarjmoralez <jmoralz92@gmail.com>
      Co-authored-by: default avatarmarcelonieva7 <72712805+marcelonieva7@users.noreply.github.com>
      Co-authored-by: default avatarPhilip Hyunsu Cho <chohyu01@cs.washington.edu>
      Co-authored-by: default avatarDeddy Jobson <dedjob@hotmail.com>
      Co-authored-by: default avatarAlberto Ferreira <AlbertoEAF@users.noreply.github.com>
      Co-authored-by: default avatarmjmckp <mjmckp@users.noreply.github.com>
      Co-authored-by: default avatarmatthew-peacock <matthew.peacock@whiteoakam.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      Co-authored-by: default avatarashok-ponnuswami-msft <57648631+ashok-ponnuswami-msft@users.noreply.github.com>
      Co-authored-by: default avatarStrikerRUS <nekit94-12@hotmail.com>
      6ad3e6e9
  27. 15 Mar, 2021 1 commit
  28. 11 Mar, 2021 1 commit
  29. 10 Mar, 2021 2 commits
  30. 04 Mar, 2021 1 commit
  31. 02 Mar, 2021 1 commit
  32. 24 Feb, 2021 1 commit
  33. 23 Feb, 2021 1 commit