Commits · a23026c8970b984d08c26bfc2932431b7473b3e9 · OpenDAS / bitsandbytes

11 Jun, 2025 1 commit

[Triton/XPU] Support 4bit dequantization logic on Triton (#1629) · a23026c8

Dmitrii Makarenko authored Jun 11, 2025



* [xpu/triton] Add trtion dequantization kernel

This PR adds xpu backend and trtion kernel for dequantization nf4 dtype.
Trtion is an optional import.
Tests:
	tests/test_functional.py::TestQuantize4BitFunctional supported nf4/fp4 cases
	tests/test_functional.py::Test8BitBlockwiseQuantizeFunctional
implemented quantize_blockwise with binary search that works faster for XPU
        tests/test_linear4bit.py
Signed-off-by: Dmitrii Makarenko <dmitrii.makarenko@intel.com>

* align with ipex code

* enable test for ipex

* test_kbit_backprop: skip no longer needed

* remove unused

---------
Signed-off-by: Dmitrii Makarenko <dmitrii.makarenko@intel.com>

a23026c8

02 Jun, 2025 1 commit

Fix CI regression (#1666) · 945f7c1d

Matthew Douglas authored Jun 02, 2025

* Tests: xfail opcheck for 4bit quantization with floating storage dtypes

* Tests: xfail opcheck for 4bit quantization with floating storage dtypes

* Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch

* Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch

945f7c1d

28 May, 2025 1 commit

Enable CPU/XPU native and ipex path (#1628) · aaa71d7e

jiqing-feng authored May 29, 2025



* enable ipex
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu 8bit quantization
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix int8 and nf4 cpu inference
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add cpu fp4 and rem
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex op
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 name
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 ipex
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bitfp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix quantize blockwise output shape
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix quant_storage bf16 and gemv cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lib
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip xpu dequantize blockwise op check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bit
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip not used function teests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bit fp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check ipex before MatMul8bitFp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update ipex install guide
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update install guide
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error lof
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* move torch op to default
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert ipex check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix code tabledevice
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix code table device
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu ops
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

aaa71d7e

24 May, 2025 1 commit

General cleanup & test improvements (#1646) · 503d243e

Matthew Douglas authored May 23, 2025

* General cleanup & test improvements

* Tests: WA numpy 2 compat issue for torch<2.3

* Tests: update aarch64 cpu min torch version

* Tests: update aarch64 cpu min torch version

* Tests: update aarch64 cpu min torch version

503d243e

13 May, 2025 1 commit
- Improvements to test suite (#1636) · 42bc7291
  Matthew Douglas authored May 13, 2025
```
* Improvements for testing suite

* Add workflow for macOS arm64 CPU tests
```
  42bc7291
28 Apr, 2025 1 commit

Add simple op implementations for CPU (#1602) · 10b9d4cd

Matthew Douglas authored Apr 28, 2025

* Additional 4bit CPU ops

* Additional 4bit CPU ops

* Implement additional device-agnostic ops and test updates

* More test fixes

* int8 tests passing

* Fix feature flag for multi_backend

10b9d4cd

22 Apr, 2025 1 commit

Updates for device agnosticism (#1601) · 1088ec52

Matthew Douglas authored Apr 22, 2025

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Small bugfix for int8 test

* Exclude backward() from code coverage reports

* Params4bit: don't try to quantize when moving to meta device

1088ec52

27 Mar, 2025 1 commit

Test cleanup (#1576) · 8b6fe9ee

Matthew Douglas authored Mar 27, 2025

* Testing cleanup

* More test cleanup

* Additional deprecations/removals.

* Skip benchmark, deprecated, slow tests by default

8b6fe9ee

25 Mar, 2025 1 commit

PyTorch Custom Operator Integration (#1544) · e82f72b3

Matthew Douglas authored Mar 25, 2025



* Sketch out first custom op registration

* Add note

* Initial int8 op registration

* Cleanup some deprecated functions.

* Int8 ops updates; tests

* Implement 4bit quant/dequant ops

* Fix nested quant

* cleanup

* Test improvements

* Clean up and improve tests

* Add higher level custom op for int8 matmul + dequant + bias

* Add gemv 4bit custom op

* Cleanup

* Implement out kwarg overloads for custom ops

* Update PyTorch minimum to 2.1

* Deprecation updates

* Deprecation updates

* Cleanup; rename int8_linear_dequant -> int8_scaled_mm

* Bump min pytorch to 2.2

* cleanup

* Test reorganization

* Remove deprecated supports_igemmlt

* More cleanup

* Cleanup obsolete C++/CUDA code

* Cleanup

* Create 'default' backend for fallback op implementations; initial CPU nf4 work

* Stub out for multi-platform

* Fix serialization tests for torch>=2.6.0

* Add example for torch.compile e2e inference

* Test update

---------
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

e82f72b3