Commits · a23026c8970b984d08c26bfc2932431b7473b3e9 · OpenDAS / bitsandbytes

11 Jun, 2025 1 commit

[Triton/XPU] Support 4bit dequantization logic on Triton (#1629) · a23026c8

Dmitrii Makarenko authored Jun 11, 2025



* [xpu/triton] Add trtion dequantization kernel

This PR adds xpu backend and trtion kernel for dequantization nf4 dtype.
Trtion is an optional import.
Tests:
	tests/test_functional.py::TestQuantize4BitFunctional supported nf4/fp4 cases
	tests/test_functional.py::Test8BitBlockwiseQuantizeFunctional
implemented quantize_blockwise with binary search that works faster for XPU
        tests/test_linear4bit.py
Signed-off-by: Dmitrii Makarenko <dmitrii.makarenko@intel.com>

* align with ipex code

* enable test for ipex

* test_kbit_backprop: skip no longer needed

* remove unused

---------
Signed-off-by: Dmitrii Makarenko <dmitrii.makarenko@intel.com>

a23026c8

08 Jun, 2025 1 commit
- Improvement for torch.compile support on Params4bit (#1673) · d9333aa9
  Matthew Douglas authored Jun 08, 2025
  
  d9333aa9
06 Jun, 2025 1 commit
- Fix Linear4bit warnings/test for compute dtype · e9f3605f
  Matthew Douglas authored Jun 06, 2025
  
  e9f3605f
04 Jun, 2025 1 commit

Deprecation cleanup (#1669) · 849d9449

Matthew Douglas authored Jun 04, 2025

* Deprecation cleanup: remove histogram_scatter_add_2d

* Deprecation cleanup: vectorwise_mm_dequant

* Deprecation cleanup: vectorwise_quant

* Remove unused test

* Optimizer test cleanup

* Deprecations: remove estimate_quantiles, create_quantile_map

* Move deprecated test

849d9449

03 Jun, 2025 1 commit
- Tests: don't require grad on weights for test_kbit_backprop · 55ebaac7
  Matthew Douglas authored Jun 03, 2025
  
  55ebaac7
02 Jun, 2025 2 commits

Add CPU + IPEX to nightly CI (#1667) · 318a86e3

Matthew Douglas authored Jun 02, 2025

* Tests: add linux x64 cpu+ipex to nightly CI workflow

* typo

* Tests: guard linear8bit compile test for ipex cpu issue

318a86e3

Fix CI regression (#1666) · 945f7c1d

Matthew Douglas authored Jun 02, 2025

* Tests: xfail opcheck for 4bit quantization with floating storage dtypes

* Tests: xfail opcheck for 4bit quantization with floating storage dtypes

* Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch

* Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch

945f7c1d

28 May, 2025 1 commit

Enable CPU/XPU native and ipex path (#1628) · aaa71d7e

jiqing-feng authored May 29, 2025



* enable ipex
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu 8bit quantization
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix int8 and nf4 cpu inference
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add cpu fp4 and rem
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex op
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 name
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 ipex
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bitfp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix quantize blockwise output shape
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix quant_storage bf16 and gemv cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lib
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip xpu dequantize blockwise op check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bit
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip not used function teests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bit fp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check ipex before MatMul8bitFp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update ipex install guide
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update install guide
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error lof
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* move torch op to default
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert ipex check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix code tabledevice
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix code table device
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu ops
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

aaa71d7e

24 May, 2025 2 commits

Add torch.compile tests (#1648) · 9f858294

Matthew Douglas authored May 23, 2025

* Add torch.compile tests

* Tests: WA aarch64 CPU regressions for torch 2.6.0; add Windows torch==2.7.0+cu118 test config

* Tests: skip torch.compile for cuda on windows

9f858294

General cleanup & test improvements (#1646) · 503d243e

Matthew Douglas authored May 23, 2025

* General cleanup & test improvements

* Tests: WA numpy 2 compat issue for torch<2.3

* Tests: update aarch64 cpu min torch version

* Tests: update aarch64 cpu min torch version

* Tests: update aarch64 cpu min torch version

503d243e

21 May, 2025 1 commit
- Update test for L40S · 70bacda7
  Matthew Douglas authored May 21, 2025
  
  70bacda7
19 May, 2025 1 commit

CI runner updates (#1643) · cdcae8d3

Matthew Douglas authored May 19, 2025

* Test g5g runner

* Switch L4 to L40S runner; swap GitHub Linux T4 runner for AWS g4dn

* Run tests on last 2 pytorch stable releases

* Run tests on last 2 pytorch stable releases

cdcae8d3

13 May, 2025 1 commit
- Improvements to test suite (#1636) · 42bc7291
  Matthew Douglas authored May 13, 2025
```
* Improvements for testing suite

* Add workflow for macOS arm64 CPU tests
```
  42bc7291
29 Apr, 2025 1 commit

Set up nightly CI for unit tests (#1619) · a5dd01bb

Matthew Douglas authored Apr 29, 2025

* Run unit tests on GH Actions

* fix

* fix

* trigger workflow

* Update

* Update

* Update

* Run tests nightly

* Disable paged optimizer test on Windows

* Skip unit tests on Windows for CUDA 12.x (driver on runner is too old)

a5dd01bb

28 Apr, 2025 1 commit

Add simple op implementations for CPU (#1602) · 10b9d4cd

Matthew Douglas authored Apr 28, 2025

* Additional 4bit CPU ops

* Additional 4bit CPU ops

* Implement additional device-agnostic ops and test updates

* More test fixes

* int8 tests passing

* Fix feature flag for multi_backend

10b9d4cd

22 Apr, 2025 1 commit

Updates for device agnosticism (#1601) · 1088ec52

Matthew Douglas authored Apr 22, 2025

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Small bugfix for int8 test

* Exclude backward() from code coverage reports

* Params4bit: don't try to quantize when moving to meta device

1088ec52

27 Mar, 2025 2 commits

Test cleanup (#1576) · 8b6fe9ee

Matthew Douglas authored Mar 27, 2025

* Testing cleanup

* More test cleanup

* Additional deprecations/removals.

* Skip benchmark, deprecated, slow tests by default

8b6fe9ee

Drop Python 3.8 support. (#1574) · 677ff400
Matthew Douglas authored Mar 27, 2025
```
* Drop Python 3.8 support.

* Formatting
```
677ff400

25 Mar, 2025 1 commit

PyTorch Custom Operator Integration (#1544) · e82f72b3

Matthew Douglas authored Mar 25, 2025



* Sketch out first custom op registration

* Add note

* Initial int8 op registration

* Cleanup some deprecated functions.

* Int8 ops updates; tests

* Implement 4bit quant/dequant ops

* Fix nested quant

* cleanup

* Test improvements

* Clean up and improve tests

* Add higher level custom op for int8 matmul + dequant + bias

* Add gemv 4bit custom op

* Cleanup

* Implement out kwarg overloads for custom ops

* Update PyTorch minimum to 2.1

* Deprecation updates

* Deprecation updates

* Cleanup; rename int8_linear_dequant -> int8_scaled_mm

* Bump min pytorch to 2.2

* cleanup

* Test reorganization

* Remove deprecated supports_igemmlt

* More cleanup

* Cleanup obsolete C++/CUDA code

* Cleanup

* Create 'default' backend for fallback op implementations; initial CPU nf4 work

* Stub out for multi-platform

* Fix serialization tests for torch>=2.6.0

* Add example for torch.compile e2e inference

* Test update

---------
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

e82f72b3

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

20 Sep, 2024 2 commits

Change 8bit optimizer blocksize 2048->256; additional bf16 support (#1365) · aa57bd89
Matthew Douglas authored Sep 20, 2024
```
* Change 8bit optimizer blocksize 2048->256; additional bf16 support
* Update tolerances for 8bit optimizer tests
```
aa57bd89

Add AdEMAMix optimizer (#1360) · d9645465

Matthew Douglas authored Sep 20, 2024

* Add AdEMAMix optimizer

* Add PagedAdEMAMix32bit, AdEMAMix32bit

* Add PagedAdEMAMix32bit, AdEMAMix32bit

* AdEMAMix: add support for alpha/beta3 scheduling

* Update paged AdEMAMix

d9645465

14 Aug, 2024 1 commit
- Bugfix: Load correct nocublaslt library variant when BNB_CUDA_VERSION override is set (#1318) · a4875fc0
  Matthew Douglas authored Aug 14, 2024
  
  a4875fc0
06 Aug, 2024 1 commit

Embedding4bit and Embedding8bit implementation (#1292) · 6d714a5c

Vladimir Malinovskii authored Aug 06, 2024



* Embedding4bit and Embedding8bit implementation

* lint

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* saving -> Saving

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

6d714a5c

15 Jul, 2024 1 commit

Fixed tests for cpu only platforms (#1259) · 39b42e74

Vladimir Malinovskii authored Jul 15, 2024

* fixed test_4bit_warnings on cpu-only platforms

* fixed linear8bit-based tests for cpu only platforms

39b42e74

30 May, 2024 1 commit

FIX Make Int8Params deepcopy-able · ed99b3c1

Benjamin Bossan authored May 30, 2024

This requires to implement the __deepcopy__ method in Int8Params.
Moreover, there was an issue in the Linear8BitLT constructor that would
assign instance attributes to the class, which is now fixed.

Please review carefully that this does not impact existing code.

Tests that I ran:

- pytest tests/test_linear8bitlt.py
- in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py
- in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py
- in transformers: RUN_SLOW=1 python -m pytest tests/quantization/bnb -x

ed99b3c1

29 May, 2024 1 commit

FIX Prevent __getstate__ from mutating Params4bit · 2fb212bd

Benjamin Bossan authored May 29, 2024

As discussed internally, use state = self.__dict__.copy(), which is also
what the Python docs recommend.

2fb212bd

02 Apr, 2024 1 commit
- Tests: improve memory usage (#1147) · bed0860b
  Matthew Douglas authored Apr 02, 2024
  
  bed0860b
29 Mar, 2024 1 commit
- Fix 4bit quantization with blocksize=4096 · c17fb8eb
  Matthew Douglas authored Mar 29, 2024
  
  c17fb8eb
13 Mar, 2024 2 commits
- Reformat with ruff-format · 5a4263f4
  Ruff authored Feb 24, 2024
  
  5a4263f4
- Rework CUDA setup and diagnostics · e2db55ed
  Aarni Koskela authored Feb 06, 2024
  
  e2db55ed
11 Mar, 2024 2 commits
- Add additional guard for "no NVIDIA driver" · 2416dd36
  Aarni Koskela authored Mar 08, 2024
  
  2416dd36
- Soft-require `transformers` in tests · 62249b4a
  Aarni Koskela authored Mar 08, 2024
  
  62249b4a
06 Mar, 2024 1 commit
- Deduplicate helpers & fix lint issues from #1099 (#1107) · 048a2d40
  Aarni Koskela authored Mar 06, 2024
  
  048a2d40
05 Mar, 2024 1 commit
- adding whole Linear8bitLt/Linear4bit module save/load serialization (#1099) · a1c0844b
  rdyro authored Mar 05, 2024
  
  a1c0844b
21 Feb, 2024 3 commits

add deepcopy and copy for Param4bit (#1060) · cfd6ac75

Marc Sun authored Feb 21, 2024



* fix deepcopy and copy

* add tests

* remove line

* ruff fix

* ruff

* Update tests/test_linear4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* add missing state

* ruff format

* ignore formatting commit for git blame

* Params4bit should be initialized as frozen by default

* add test for serialization round-tripping

* add comparison capability for QuantSate

* add back accidentally remove line

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

cfd6ac75

tests: fix all_close to respect max 2 positional args (#1074) · d11b5068
Titus authored Feb 21, 2024

d11b5068
tests/helpers.py: fix py38 vers incompatibility from other PR · 0bf71989
Titus authored Feb 21, 2024

0bf71989

05 Feb, 2024 1 commit
- Enable crate-ci/typos lint; fix typos (#1005) · 8c507d92
  Aarni Koskela authored Feb 05, 2024
```
Co-authored-by: Titus von Koeller <titus@vonkoeller.com>

fix erroneous correction
```
  8c507d92
01 Feb, 2024 1 commit
- Enable line-ending and other hygiene lints (#1006) · 6974920b
  Aarni Koskela authored Feb 01, 2024
  
  6974920b