1. 24 Feb, 2021 3 commits
    • jmoralez's avatar
      [dask][python-package] include support for column array as label (#3943) · 5dacd603
      jmoralez authored
      * include support for column array as label
      
      * remove nested ifs
      
      * fix linting errors
      
      * include tests for sklearn regressors
      
      * include docstring for numpy_1d_array_to_dtype
      
      * include . at end of docstring
      
      * remove pandas import and test for regression, classification and ranking
      
      * check predictions of sklearn models as well
      
      * test training only in dask. drop pandas series tests
      
      * use PANDAS_INSTALLED and pd_Series
      
      * inline imports
      
      * use col array in fit for test_dask
      
      * include review comments
      5dacd603
    • jmoralez's avatar
      [dask] use random ports in network setup (#3823) · 0e576575
      jmoralez authored
      * use socket.bind with port 0 and client.run to find random open ports
      
      * include test for found ports
      
      * find random open ports as default
      
      * parametrize local_listen_port. type hint to _find_random_open_port. fid open ports only on workers with data.
      
      * make indentation consistent and pass list of workers to client.run
      
      * remove socket import
      
      * change random port implementation
      
      * fix test
      0e576575
    • Nikita Titov's avatar
      7777852a
  2. 23 Feb, 2021 1 commit
  3. 20 Feb, 2021 1 commit
  4. 19 Feb, 2021 1 commit
  5. 17 Feb, 2021 1 commit
    • Alex Ford's avatar
      Optimize array-from-ctypes in basic.py (#3927) · de8c6105
      Alex Ford authored
      Approximately %80 of runtime when loading "low column count, high row
      count" DataFrames into Datasets is consumed in `np.fromiter`, called
      as part of the `Dataset.get_field` method.
      
      This is particularly pernicious hotspot, as unlike other ctypes-based
      methods this is a hot loop over a python iterator loop and causes
      significant GIL-contention in multi-threaded applications.
      
      Replace `np.fromiter` with a direct call to `np.ctypeslib.as_array`,
      which allows a single-shot `copy` of the underlying array.
      
      This reduces the load time of a ~35 million row categorical dataframe
      with 1 column from ~5 seconds to ~1 second, and allows multi-threaded
      execution.
      de8c6105
  6. 16 Feb, 2021 6 commits
  7. 15 Feb, 2021 5 commits
  8. 10 Feb, 2021 1 commit
  9. 09 Feb, 2021 1 commit
  10. 07 Feb, 2021 1 commit
  11. 06 Feb, 2021 1 commit
  12. 03 Feb, 2021 4 commits
  13. 29 Jan, 2021 1 commit
  14. 28 Jan, 2021 1 commit
  15. 27 Jan, 2021 1 commit
  16. 26 Jan, 2021 5 commits
  17. 25 Jan, 2021 3 commits
  18. 24 Jan, 2021 2 commits
  19. 22 Jan, 2021 1 commit