Commits · 1813b0583bfeeeeeb0c8e8e2bd9f22dd3d766564 · OpenDAS / bitsandbytes

15 Sep, 2025 1 commit

Add SYCL Kernels for XPU backend (#1679) · 1813b058

Liu Xiaoli authored Sep 15, 2025



* Add SYCL Kernels for XPU backend

* fix transpose
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix log and format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert cpu changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean ipex_xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean ipex import
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex cpu import
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* refine gemv_4bit kernel

* enable FP4 for dequant_4bit and gemv_4bit

* refine FP4 dequantization performance

* remove check for better performance
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean code

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory issue

* fix ut failure

* adjust threshold
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* change test_functional check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test_module
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Enable Windows build and refine code

* fix xpu log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* remove ipex entirely
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu int8 CB
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lint
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix logs (#12)

* fix logs
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix sycl lint error and tests (#13)

* fix sycl nd
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip typo check for xpu kernel codes (#14)

* skip test for xpu ops
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lint
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip typo for xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* register triton kernel for quantization (#15)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix version comparison issue (#18)

# Description

The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string

# Error message
```
The 8-bit optimizer is not available on your device, only available on CUDA for now.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Traceback (most recent call last):
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module>
    import unsloth
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module>
    from .models import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module>
    from .llama     import FastLlamaModel
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module>
    from ._utils import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module>
    from unsloth_zoo.patching_utils import (
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module>
    import transformers.integrations.bitsandbytes
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module>
    import bitsandbytes as bnb
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module>
    from .backends.xpu import ops as xpu_ops
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module>
    if version.parse(torch.__version__).release >= version.parse("2.9"):
TypeError: '>=' not supported between instances of 'tuple' and 'Version'
```

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Er-Xin (Edwin) Shang <shangerxin@hotmail.com>

1813b058

17 Jun, 2025 1 commit

CI: Setup HPU nightly tests (#1681) · 29564ad6

Matthew Douglas authored Jun 17, 2025

* Setup XPU CI

* CI: expand XPU matrix

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* skip some fp4 tests on hpu

* skip some fp4 tests on hpu

* skip gemv tests on hpu

* test

* Additional test patches for HPU

* HPU test update

* HPU test update

* HPU test update

* HPU test update

* Format

29564ad6

02 Jun, 2025 1 commit

Add CPU + IPEX to nightly CI (#1667) · 318a86e3

Matthew Douglas authored Jun 02, 2025

* Tests: add linux x64 cpu+ipex to nightly CI workflow

* typo

* Tests: guard linear8bit compile test for ipex cpu issue

318a86e3

24 May, 2025 1 commit

Add torch.compile tests (#1648) · 9f858294

Matthew Douglas authored May 23, 2025

* Add torch.compile tests

* Tests: WA aarch64 CPU regressions for torch 2.6.0; add Windows torch==2.7.0+cu118 test config

* Tests: skip torch.compile for cuda on windows

9f858294

28 Apr, 2025 1 commit

Add simple op implementations for CPU (#1602) · 10b9d4cd

Matthew Douglas authored Apr 28, 2025

* Additional 4bit CPU ops

* Additional 4bit CPU ops

* Implement additional device-agnostic ops and test updates

* More test fixes

* int8 tests passing

* Fix feature flag for multi_backend

10b9d4cd

22 Apr, 2025 1 commit

Updates for device agnosticism (#1601) · 1088ec52

Matthew Douglas authored Apr 22, 2025

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Small bugfix for int8 test

* Exclude backward() from code coverage reports

* Params4bit: don't try to quantize when moving to meta device

1088ec52

25 Mar, 2025 1 commit

PyTorch Custom Operator Integration (#1544) · e82f72b3

Matthew Douglas authored Mar 25, 2025



* Sketch out first custom op registration

* Add note

* Initial int8 op registration

* Cleanup some deprecated functions.

* Int8 ops updates; tests

* Implement 4bit quant/dequant ops

* Fix nested quant

* cleanup

* Test improvements

* Clean up and improve tests

* Add higher level custom op for int8 matmul + dequant + bias

* Add gemv 4bit custom op

* Cleanup

* Implement out kwarg overloads for custom ops

* Update PyTorch minimum to 2.1

* Deprecation updates

* Deprecation updates

* Cleanup; rename int8_linear_dequant -> int8_scaled_mm

* Bump min pytorch to 2.2

* cleanup

* Test reorganization

* Remove deprecated supports_igemmlt

* More cleanup

* Cleanup obsolete C++/CUDA code

* Cleanup

* Create 'default' backend for fallback op implementations; initial CPU nf4 work

* Stub out for multi-platform

* Fix serialization tests for torch>=2.6.0

* Add example for torch.compile e2e inference

* Test update

---------
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

e82f72b3

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

15 Jul, 2024 1 commit

Fixed tests for cpu only platforms (#1259) · 39b42e74

Vladimir Malinovskii authored Jul 15, 2024

* fixed test_4bit_warnings on cpu-only platforms

* fixed linear8bit-based tests for cpu only platforms

39b42e74

30 May, 2024 1 commit

FIX Make Int8Params deepcopy-able · ed99b3c1

Benjamin Bossan authored May 30, 2024

This requires to implement the __deepcopy__ method in Int8Params.
Moreover, there was an issue in the Linear8BitLT constructor that would
assign instance attributes to the class, which is now fixed.

Please review carefully that this does not impact existing code.

Tests that I ran:

- pytest tests/test_linear8bitlt.py
- in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py
- in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py
- in transformers: RUN_SLOW=1 python -m pytest tests/quantization/bnb -x

ed99b3c1

13 Mar, 2024 1 commit
- Reformat with ruff-format · 5a4263f4
  Ruff authored Feb 24, 2024
  
  5a4263f4
06 Mar, 2024 1 commit
- Deduplicate helpers & fix lint issues from #1099 (#1107) · 048a2d40
  Aarni Koskela authored Mar 06, 2024
  
  048a2d40
05 Mar, 2024 1 commit
- adding whole Linear8bitLt/Linear4bit module save/load serialization (#1099) · a1c0844b
  rdyro authored Mar 05, 2024
  
  a1c0844b
01 Feb, 2024 1 commit

Test improvements (#1001) · 2336a45c

Aarni Koskela authored Feb 01, 2024

* test_nvidia_transform: fix variable reference

`out_order` is the global parametrization list, not the test fixture argument

* Make `parametrize` use more idiomatic

* Use a more deterministic helper for `dim*` determination

* Convert NO_CUBLASLT errors into skips too

* Mark slow and benchmark tests as such (allows `-k "not benchmark"`)

2336a45c

30 Jan, 2024 1 commit

Ruff fixes (#984) · 706ec24d

Aarni Koskela authored Jan 30, 2024



* Adjust Ruff configuration

* do not autofix always
* be less strict around tests and benchmarks
* adjust ignores for now

* Ruff: autofix I and F401

* Apply ruff autofixes

* Fix RUF013 complaint

* Fix mutable default in replace_linear

* Don't use bare except

* Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo

* Fix ruff B008 (function call in arguments)

* Add ruff noqas as suitable

* Fix RUF005 (splat instead of concatenating)

* Fix B018 (useless expression)

* Add pre-commit configuration + GitHub Actions lint workflow

* Fix unused `e` in bitsandbytes/__main__.py

* fix merge conflict resolution error

* run pre-commit hook

---------
Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>

706ec24d

24 Jan, 2024 1 commit

Tests: improve CUDA support detection (#985) · f1c75741

Aarni Koskela authored Jan 24, 2024

* implicitly skip any test that implicitly uses CUDA on a non-CUDA box
* add a `requires_cuda` fixture

f1c75741

21 Mar, 2023 1 commit
- Add force_no_igemmlt to test params · dcecbb26
  Max Ryabinin authored Mar 22, 2023
  
  dcecbb26
25 Feb, 2023 1 commit
- Handle more cases in test_linear_serialization · ac3ab281
  Max Ryabinin authored Feb 25, 2023
  
  ac3ab281
21 Feb, 2023 1 commit
- [WIP] Implement proper serialization of Linear8bitLt · 58b09ee1
  Max Ryabinin authored Feb 21, 2023
  
  58b09ee1
02 Feb, 2023 1 commit
- Added Int8 matmul support for all GPUs. Full backward support. · de535889
  Tim Dettmers authored Feb 01, 2023
  
  de535889