1. 25 Mar, 2025 1 commit
    • Matthew Douglas's avatar
      PyTorch Custom Operator Integration (#1544) · e82f72b3
      Matthew Douglas authored
      
      
      * Sketch out first custom op registration
      
      * Add note
      
      * Initial int8 op registration
      
      * Cleanup some deprecated functions.
      
      * Int8 ops updates; tests
      
      * Implement 4bit quant/dequant ops
      
      * Fix nested quant
      
      * cleanup
      
      * Test improvements
      
      * Clean up and improve tests
      
      * Add higher level custom op for int8 matmul + dequant + bias
      
      * Add gemv 4bit custom op
      
      * Cleanup
      
      * Implement out kwarg overloads for custom ops
      
      * Update PyTorch minimum to 2.1
      
      * Deprecation updates
      
      * Deprecation updates
      
      * Cleanup; rename int8_linear_dequant -> int8_scaled_mm
      
      * Bump min pytorch to 2.2
      
      * cleanup
      
      * Test reorganization
      
      * Remove deprecated supports_igemmlt
      
      * More cleanup
      
      * Cleanup obsolete C++/CUDA code
      
      * Cleanup
      
      * Create 'default' backend for fallback op implementations; initial CPU nf4 work
      
      * Stub out for multi-platform
      
      * Fix serialization tests for torch>=2.6.0
      
      * Add example for torch.compile e2e inference
      
      * Test update
      
      ---------
      Co-authored-by: default avatarTitus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
      e82f72b3
  2. 14 Jan, 2025 1 commit
  3. 05 Dec, 2024 1 commit
    • Matthew Douglas's avatar
      LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d
      Matthew Douglas authored
      
      
      * Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation
      
      * Fix unintended change
      
      * New naive mm_dequant kernel for row-major; cleanup
      
      * fix
      
      * int8 refactor: initial sparse decomp, cleanup
      
      * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup
      
      * int8: inference optimizations, some cleanup
      
      * int8: more tests passing, cleanup
      
      * int8 - more cleanup, most tests passing
      
      * int8: specify CUDA stream for int8 ops
      
      * perf: reduce overhead from getting cudaStream ptr
      
      * Mark some functions for deprecation.
      
      * int8 sparse decomp: small perf improvement
      
      * update setup.py
      
      * Update bitsandbytes/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn
      
      * int8 cleanup
      
      * Ignore ruff rule ISC001 (incompatible with formatter)
      
      * add comment
      
      * int8 more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8: rename / deprecate old fn signatures
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * type annotation
      
      * format update
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Add comment to explain division optimization
      
      * more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Type annotations, cleanup
      
      * remove unused kernels; improved type annotations
      
      * small perf optimization for single-GPU systems
      
      * small perf optimization for single-GPU systems
      
      * update docstrings
      
      * Improve docs and tests
      
      * Update docstring
      
      * Update test
      
      * add benchmarking script
      
      * test cleanup: add deprecated marker, move benchmarks out
      
      * Add int8 dequant function; misc improvements
      
      * int8 matmul fallback for inner dims not divisible by 4
      
      * improve register usage of kInt8VectorQuant - especially for A100/H100
      
      * disable fail-fast for package build
      
      * maxwell compat
      
      * ptxas verbose
      
      * docs update
      
      * doc update
      
      * backward fix
      
      * Bugfix sparse decomp
      
      * Int8 fix for PEFT OLoRA init
      
      * Fix test for deprecated spmm_coo
      
      * test improvement
      
      * doc update
      
      * typo
      
      * doc cleanup
      
      * docs
      
      * add inference benchmark script
      
      * Add benchmarks, doc update
      
      ---------
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      81e6345d
  4. 23 Oct, 2024 1 commit
  5. 20 Sep, 2024 2 commits
  6. 26 Aug, 2024 1 commit
  7. 12 Jul, 2024 1 commit
    • Markus Hennerbichler's avatar
      Fix CUDA 12.5 build issue (#1273) · 85e01276
      Markus Hennerbichler authored
      pythonInterface.cpp depends on ops.cuh
      which in turn depends on some thrust headers.
      It is defined as a C++ compilation unit
      which is problematic  becuase thrift doesn't guarantee
      compatibility with a host compiler.
      
      This is starting to cause issues with CUDA 12.5.
      There is no dependency on the thrust headers,
      which means they can be removed without other consequences.
      85e01276
  8. 23 Feb, 2024 1 commit
  9. 14 Feb, 2024 1 commit
  10. 05 Feb, 2024 1 commit
  11. 01 Feb, 2024 1 commit
  12. 31 Jan, 2024 1 commit
  13. 09 Dec, 2023 1 commit
  14. 19 Jul, 2023 1 commit
  15. 11 Jul, 2023 1 commit
  16. 10 Jul, 2023 5 commits
  17. 09 Jul, 2023 2 commits
  18. 08 Jul, 2023 2 commits
  19. 05 Jul, 2023 1 commit
  20. 04 Jul, 2023 2 commits
  21. 31 May, 2023 3 commits
  22. 24 May, 2023 1 commit
  23. 06 May, 2023 1 commit
  24. 02 May, 2023 7 commits