1. 05 May, 2021 1 commit
  2. 04 May, 2021 1 commit
  3. 02 May, 2021 1 commit
  4. 30 Apr, 2021 1 commit
  5. 26 Apr, 2021 1 commit
  6. 21 Apr, 2021 1 commit
  7. 19 Apr, 2021 1 commit
  8. 10 Apr, 2021 1 commit
  9. 01 Apr, 2021 1 commit
    • jmoralez's avatar
      [tests][dask] Add voting_parallel algorithm in tests (fixes #3834) (#4088) · d517ba12
      jmoralez authored
      * include voting_parallel tree_learner in test_regressor, test_classifier and test_ranker
      
      * remove test for warnings and test for error when using feature_parallel
      
      * use real names for tree_learner intest and include test for aliases. use the error message in the test for error in feature parallel
      
      * split all tests with rf in test_classifier
      
      * remove task parametrization for tree_learner aliases test. smaller input data from feature_parallel error
      
      * define task for tree_learner aliases
      d517ba12
  10. 31 Mar, 2021 1 commit
  11. 30 Mar, 2021 1 commit
  12. 29 Mar, 2021 2 commits
  13. 27 Mar, 2021 1 commit
  14. 16 Mar, 2021 1 commit
  15. 15 Mar, 2021 2 commits
  16. 14 Mar, 2021 1 commit
  17. 10 Mar, 2021 1 commit
  18. 04 Mar, 2021 1 commit
    • jmoralez's avatar
      [dask] Include support for init_score (#3950) · 37e98782
      jmoralez authored
      * include support for init_score
      
      * use dataframe from init_score and test difference with and without init_score in local model
      
      * revert refactoring
      
      * initial docs. test between distributed models with and without init_score
      
      * remove ranker from tests
      
      * test value for root node and change docs
      
      * comma
      
      * re-include parametrize
      
      * fix incorrect merge
      
      * use single init_score and the booster_ attribute
      
      * use np.float64 instead of float
      37e98782
  19. 24 Feb, 2021 3 commits
    • jmoralez's avatar
      [dask][python-package] include support for column array as label (#3943) · 5dacd603
      jmoralez authored
      * include support for column array as label
      
      * remove nested ifs
      
      * fix linting errors
      
      * include tests for sklearn regressors
      
      * include docstring for numpy_1d_array_to_dtype
      
      * include . at end of docstring
      
      * remove pandas import and test for regression, classification and ranking
      
      * check predictions of sklearn models as well
      
      * test training only in dask. drop pandas series tests
      
      * use PANDAS_INSTALLED and pd_Series
      
      * inline imports
      
      * use col array in fit for test_dask
      
      * include review comments
      5dacd603
    • jmoralez's avatar
      [dask] use random ports in network setup (#3823) · 0e576575
      jmoralez authored
      * use socket.bind with port 0 and client.run to find random open ports
      
      * include test for found ports
      
      * find random open ports as default
      
      * parametrize local_listen_port. type hint to _find_random_open_port. fid open ports only on workers with data.
      
      * make indentation consistent and pass list of workers to client.run
      
      * remove socket import
      
      * change random port implementation
      
      * fix test
      0e576575
    • Nikita Titov's avatar
      7777852a
  20. 23 Feb, 2021 1 commit
  21. 20 Feb, 2021 1 commit
  22. 19 Feb, 2021 1 commit
  23. 17 Feb, 2021 1 commit
    • Alex Ford's avatar
      Optimize array-from-ctypes in basic.py (#3927) · de8c6105
      Alex Ford authored
      Approximately %80 of runtime when loading "low column count, high row
      count" DataFrames into Datasets is consumed in `np.fromiter`, called
      as part of the `Dataset.get_field` method.
      
      This is particularly pernicious hotspot, as unlike other ctypes-based
      methods this is a hot loop over a python iterator loop and causes
      significant GIL-contention in multi-threaded applications.
      
      Replace `np.fromiter` with a direct call to `np.ctypeslib.as_array`,
      which allows a single-shot `copy` of the underlying array.
      
      This reduces the load time of a ~35 million row categorical dataframe
      with 1 column from ~5 seconds to ~1 second, and allows multi-threaded
      execution.
      de8c6105
  24. 16 Feb, 2021 6 commits
  25. 15 Feb, 2021 5 commits
  26. 10 Feb, 2021 1 commit
  27. 09 Feb, 2021 1 commit