Commits · 4955d136ae083c2be1236d8915913166e1790aad · OpenDAS / bitsandbytes

13 Jun, 2025 1 commit
- Apply clang-format rules (#1678) · 4955d136
  Matthew Douglas authored Jun 13, 2025
  
  4955d136
04 Jun, 2025 1 commit

Matthew Douglas authored Jun 04, 2025

* Deprecation cleanup: remove histogram_scatter_add_2d

* Deprecation cleanup: vectorwise_mm_dequant

* Deprecation cleanup: vectorwise_quant

* Remove unused test

* Optimizer test cleanup

* Deprecations: remove estimate_quantiles, create_quantile_map

* Move deprecated test

849d9449

25 Mar, 2025 1 commit

PyTorch Custom Operator Integration (#1544) · e82f72b3

Matthew Douglas authored Mar 25, 2025



* Sketch out first custom op registration

* Add note

* Initial int8 op registration

* Cleanup some deprecated functions.

* Int8 ops updates; tests

* Implement 4bit quant/dequant ops

* Fix nested quant

* cleanup

* Test improvements

* Clean up and improve tests

* Add higher level custom op for int8 matmul + dequant + bias

* Add gemv 4bit custom op

* Cleanup

* Implement out kwarg overloads for custom ops

* Update PyTorch minimum to 2.1

* Deprecation updates

* Deprecation updates

* Cleanup; rename int8_linear_dequant -> int8_scaled_mm

* Bump min pytorch to 2.2

* cleanup

* Test reorganization

* Remove deprecated supports_igemmlt

* More cleanup

* Cleanup obsolete C++/CUDA code

* Cleanup

* Create 'default' backend for fallback op implementations; initial CPU nf4 work

* Stub out for multi-platform

* Fix serialization tests for torch>=2.6.0

* Add example for torch.compile e2e inference

* Test update

---------
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

e82f72b3

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

20 Sep, 2024 1 commit

Add AdEMAMix optimizer (#1360) · d9645465

Matthew Douglas authored Sep 20, 2024

* Add AdEMAMix optimizer

* Add PagedAdEMAMix32bit, AdEMAMix32bit

* Add PagedAdEMAMix32bit, AdEMAMix32bit

* AdEMAMix: add support for alpha/beta3 scheduling

* Update paged AdEMAMix

d9645465

26 Aug, 2024 1 commit

Cuda source cleanup , refactor and fixes (#1328) · 6bef412a

Abhilash Majumder authored Aug 26, 2024

* remove kcompress

* fix initial template call

* fix function name

* remove vector load

* cleanup reduce  & rearrange

* format

6bef412a

10 Jul, 2023 1 commit
- Added fp32 compute type for gemv_4bit. · 5fab6734
  Tim Dettmers authored Jul 09, 2023
  
  5fab6734
09 Jul, 2023 1 commit
- Added abitrary data types; fixed a bug for small matrices. · 4b88d69d
  Tim Dettmers authored Jul 09, 2023
  
  4b88d69d
04 Jul, 2023 1 commit
- Initial 4-bit naive batch size 1, 81 vs 185. · f89ff93e
  Tim Dettmers authored Jul 03, 2023
  
  f89ff93e
06 May, 2023 1 commit
- Added paging. · ec38ba95
  Tim Dettmers authored May 06, 2023
  
  ec38ba95
30 Apr, 2023 1 commit
- 4-bit draft. · 21723f79
  Tim Dettmers authored Apr 29, 2023
  
  21723f79
29 Apr, 2023 5 commits
- Added bit template. · cad83994
  Tim Dettmers authored Apr 28, 2023
  
  cad83994
- New implementation for batch size 1. · f3e97ccb
  Tim Dettmers authored Apr 28, 2023
  
  f3e97ccb
- Added fp16 and thread/item template. · f6df4aef
  Tim Dettmers authored Apr 28, 2023
  
  f6df4aef
- Added template refactor. · 3aef7834
  Tim Dettmers authored Apr 28, 2023
  
  3aef7834
- First baseline kernel. · c1bfb210
  Tim Dettmers authored Apr 28, 2023
  
  c1bfb210
27 Apr, 2023 2 commits
- Adedd pipeline draft. · 9cab14a3
  Tim Dettmers authored Apr 27, 2023
  
  9cab14a3
- Best attempt at cutlass3. · 0afc8e9e
  Tim Dettmers authored Apr 26, 2023
  
  0afc8e9e
25 Apr, 2023 1 commit
- Initial template. · 6bfd7a40
  Tim Dettmers authored Apr 25, 2023
  
  6bfd7a40
02 Apr, 2023 1 commit
- First draft of NF4. · 64cc0592
  Tim Dettmers authored Apr 02, 2023
  
  64cc0592
10 Mar, 2023 1 commit
- always pass beta2 into all the 1state functions · 6c377b39
  Phil Wang authored Mar 10, 2023
  
  6c377b39
04 Feb, 2023 1 commit
- Added fp4 quant/dequant and dequant optimizations. · 3ac5840c
  Tim Dettmers authored Feb 04, 2023
  
  3ac5840c
08 Nov, 2022 1 commit
- Fixed bug in cpu quant; faster GPU dequant. · 08fa2e7b
  Tim Dettmers authored Nov 07, 2022
  
  08fa2e7b
27 Oct, 2022 1 commit
- Remove trailing whitespace & ensure newline at EOF · 1eec77d3
  Tom Aarsen authored Oct 27, 2022
  
  1eec77d3
16 Aug, 2022 1 commit
- Removed storage() from get_ptr; added boilerplate for bias dequant_mm. · 1ed2fa2f
  Tim Dettmers authored Aug 16, 2022
  
  1ed2fa2f
27 Jul, 2022 1 commit
- Working outlier extraction for Turing. · bcab99ec
  Tim Dettmers authored Jul 26, 2022
  
  bcab99ec
26 Jul, 2022 2 commits
- Boilerplate and test for extract_outliers. · cbb901ac
  Tim Dettmers authored Jul 26, 2022
  
  cbb901ac
- Some progress on build script; added multi-cuda install script. · 9268dc9d
  Tim Dettmers authored Jul 25, 2022
  
  9268dc9d
22 Jul, 2022 1 commit
- Most tests passing. · c771b3a7
  Tim Dettmers authored Jul 22, 2022
  
  c771b3a7
21 Oct, 2021 1 commit
- Initial plumbing for skip_zeros. · bb34fd50
  Tim Dettmers authored Oct 20, 2021
  
  bb34fd50
06 Oct, 2021 1 commit
- Initial commit · 74399248
  Tim Dettmers authored Oct 05, 2021
  
  74399248