1. 19 Feb, 2021 1 commit
  2. 17 Feb, 2021 1 commit
    • Alex Ford's avatar
      Optimize array-from-ctypes in basic.py (#3927) · de8c6105
      Alex Ford authored
      Approximately %80 of runtime when loading "low column count, high row
      count" DataFrames into Datasets is consumed in `np.fromiter`, called
      as part of the `Dataset.get_field` method.
      
      This is particularly pernicious hotspot, as unlike other ctypes-based
      methods this is a hot loop over a python iterator loop and causes
      significant GIL-contention in multi-threaded applications.
      
      Replace `np.fromiter` with a direct call to `np.ctypeslib.as_array`,
      which allows a single-shot `copy` of the underlying array.
      
      This reduces the load time of a ~35 million row categorical dataframe
      with 1 column from ~5 seconds to ~1 second, and allows multi-threaded
      execution.
      de8c6105
  3. 16 Feb, 2021 6 commits
  4. 15 Feb, 2021 6 commits
  5. 10 Feb, 2021 1 commit
  6. 09 Feb, 2021 1 commit
  7. 07 Feb, 2021 1 commit
  8. 06 Feb, 2021 1 commit
  9. 03 Feb, 2021 4 commits
  10. 01 Feb, 2021 1 commit
  11. 31 Jan, 2021 3 commits
  12. 29 Jan, 2021 1 commit
  13. 28 Jan, 2021 1 commit
  14. 27 Jan, 2021 1 commit
  15. 26 Jan, 2021 5 commits
  16. 25 Jan, 2021 3 commits
  17. 24 Jan, 2021 2 commits
  18. 22 Jan, 2021 1 commit