1. 23 Oct, 2025 1 commit
  2. 13 Jun, 2025 1 commit
  3. 04 Jun, 2025 1 commit
    • Matthew Douglas's avatar
      Deprecation cleanup (#1669) · 849d9449
      Matthew Douglas authored
      * Deprecation cleanup: remove histogram_scatter_add_2d
      
      * Deprecation cleanup: vectorwise_mm_dequant
      
      * Deprecation cleanup: vectorwise_quant
      
      * Remove unused test
      
      * Optimizer test cleanup
      
      * Deprecations: remove estimate_quantiles, create_quantile_map
      
      * Move deprecated test
      849d9449
  4. 25 Mar, 2025 1 commit
    • Matthew Douglas's avatar
      PyTorch Custom Operator Integration (#1544) · e82f72b3
      Matthew Douglas authored
      
      
      * Sketch out first custom op registration
      
      * Add note
      
      * Initial int8 op registration
      
      * Cleanup some deprecated functions.
      
      * Int8 ops updates; tests
      
      * Implement 4bit quant/dequant ops
      
      * Fix nested quant
      
      * cleanup
      
      * Test improvements
      
      * Clean up and improve tests
      
      * Add higher level custom op for int8 matmul + dequant + bias
      
      * Add gemv 4bit custom op
      
      * Cleanup
      
      * Implement out kwarg overloads for custom ops
      
      * Update PyTorch minimum to 2.1
      
      * Deprecation updates
      
      * Deprecation updates
      
      * Cleanup; rename int8_linear_dequant -> int8_scaled_mm
      
      * Bump min pytorch to 2.2
      
      * cleanup
      
      * Test reorganization
      
      * Remove deprecated supports_igemmlt
      
      * More cleanup
      
      * Cleanup obsolete C++/CUDA code
      
      * Cleanup
      
      * Create 'default' backend for fallback op implementations; initial CPU nf4 work
      
      * Stub out for multi-platform
      
      * Fix serialization tests for torch>=2.6.0
      
      * Add example for torch.compile e2e inference
      
      * Test update
      
      ---------
      Co-authored-by: default avatarTitus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
      e82f72b3
  5. 14 Jan, 2025 1 commit
  6. 05 Dec, 2024 1 commit
    • Matthew Douglas's avatar
      LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d
      Matthew Douglas authored
      
      
      * Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation
      
      * Fix unintended change
      
      * New naive mm_dequant kernel for row-major; cleanup
      
      * fix
      
      * int8 refactor: initial sparse decomp, cleanup
      
      * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup
      
      * int8: inference optimizations, some cleanup
      
      * int8: more tests passing, cleanup
      
      * int8 - more cleanup, most tests passing
      
      * int8: specify CUDA stream for int8 ops
      
      * perf: reduce overhead from getting cudaStream ptr
      
      * Mark some functions for deprecation.
      
      * int8 sparse decomp: small perf improvement
      
      * update setup.py
      
      * Update bitsandbytes/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn
      
      * int8 cleanup
      
      * Ignore ruff rule ISC001 (incompatible with formatter)
      
      * add comment
      
      * int8 more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8: rename / deprecate old fn signatures
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * type annotation
      
      * format update
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Add comment to explain division optimization
      
      * more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Type annotations, cleanup
      
      * remove unused kernels; improved type annotations
      
      * small perf optimization for single-GPU systems
      
      * small perf optimization for single-GPU systems
      
      * update docstrings
      
      * Improve docs and tests
      
      * Update docstring
      
      * Update test
      
      * add benchmarking script
      
      * test cleanup: add deprecated marker, move benchmarks out
      
      * Add int8 dequant function; misc improvements
      
      * int8 matmul fallback for inner dims not divisible by 4
      
      * improve register usage of kInt8VectorQuant - especially for A100/H100
      
      * disable fail-fast for package build
      
      * maxwell compat
      
      * ptxas verbose
      
      * docs update
      
      * doc update
      
      * backward fix
      
      * Bugfix sparse decomp
      
      * Int8 fix for PEFT OLoRA init
      
      * Fix test for deprecated spmm_coo
      
      * test improvement
      
      * doc update
      
      * typo
      
      * doc cleanup
      
      * docs
      
      * add inference benchmark script
      
      * Add benchmarks, doc update
      
      ---------
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      81e6345d
  7. 20 Sep, 2024 2 commits
  8. 26 Aug, 2024 1 commit
  9. 22 Aug, 2024 1 commit
  10. 29 Mar, 2024 1 commit
  11. 30 Jan, 2024 1 commit
  12. 11 Jul, 2023 1 commit
  13. 10 Jul, 2023 1 commit
  14. 09 Jul, 2023 1 commit
  15. 05 Jul, 2023 1 commit
  16. 04 Jul, 2023 2 commits
  17. 24 May, 2023 1 commit
  18. 06 May, 2023 1 commit
  19. 02 May, 2023 3 commits
  20. 01 May, 2023 6 commits
  21. 30 Apr, 2023 1 commit
  22. 29 Apr, 2023 5 commits
  23. 27 Apr, 2023 3 commits
  24. 26 Apr, 2023 1 commit
  25. 25 Apr, 2023 1 commit