Commits · 42653921aa45bf4e97c2fc34c57b19932ef229d4 · OpenDAS / bitsandbytes

02 Aug, 2025 1 commit
- test_params4bit_torch_chunk_split · 2938c739
  ved1beta authored Aug 02, 2025
  
  2938c739
20 Jun, 2025 1 commit

Enable ROCm backend with custom ops integration (#1683) · 888788d7

pnunna93 authored Jun 20, 2025



* Port ROCm changes from multi-backend-refactor branch

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update test_functional.py

* Update test_functional.py

* Update cextension.py

* Update cuda_specs.py

* Update cuda_specs.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_cuda_setup_evaluator.py

* Update test_functional.py

* Update modules.py

* Update modules.py

* Update ops.py

* Update test_linear4bit.py

* Update ops.py

* Update ops.py

* Update test_linear4bit.py

* Update test_linear4bit.py

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Create build-rocm.sh

* Update cuda_specs.py

* Fix trailing whitespace

* Remove conflicts.diff

* update for hipblasVersionMajor >=3

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update main.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update test_linear4bit.py

* Lint

* Lint

* Update helpers.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Lint

* Update pythonInterface.cpp

* lint fix

* lint

* Update pythonInterface.cpp

* revert permissions change

* Fix indentation

* Update kernels_hip.cuh

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update kernels_hip.cuh

* Update kernels.hip

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update ops.hip

* Update CMakeLists.txt

* Update functional.py

* Update cextension.py

* Update cextension.py

---------
Co-authored-by: MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com>
Co-authored-by: MISHANMAUYRA <mishanmaurya31081@gmail.com>
Co-authored-by: amcamd <andrew.chapman@amd.com>
Co-authored-by: Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com>

888788d7

16 Jun, 2025 1 commit
- HPU support for unit tests (#1680) · 70bbbb92
  Chetan Kumar Verma authored Jun 16, 2025
  
  70bbbb92
08 Jun, 2025 1 commit
- Improvement for torch.compile support on Params4bit (#1673) · d9333aa9
  Matthew Douglas authored Jun 08, 2025
  
  d9333aa9
28 May, 2025 1 commit

Enable CPU/XPU native and ipex path (#1628) · aaa71d7e

jiqing-feng authored May 29, 2025



* enable ipex
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu 8bit quantization
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix int8 and nf4 cpu inference
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add cpu fp4 and rem
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex op
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 name
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 ipex
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bitfp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix quantize blockwise output shape
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix quant_storage bf16 and gemv cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lib
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip xpu dequantize blockwise op check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bit
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip not used function teests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bit fp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check ipex before MatMul8bitFp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update ipex install guide
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update install guide
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error lof
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* move torch op to default
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert ipex check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix code tabledevice
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix code table device
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu ops
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

aaa71d7e

24 May, 2025 1 commit

Add torch.compile tests (#1648) · 9f858294

Matthew Douglas authored May 23, 2025

* Add torch.compile tests

* Tests: WA aarch64 CPU regressions for torch 2.6.0; add Windows torch==2.7.0+cu118 test config

* Tests: skip torch.compile for cuda on windows

9f858294

28 Apr, 2025 1 commit

Add simple op implementations for CPU (#1602) · 10b9d4cd

Matthew Douglas authored Apr 28, 2025

* Additional 4bit CPU ops

* Additional 4bit CPU ops

* Implement additional device-agnostic ops and test updates

* More test fixes

* int8 tests passing

* Fix feature flag for multi_backend

10b9d4cd

22 Apr, 2025 1 commit

Updates for device agnosticism (#1601) · 1088ec52

Matthew Douglas authored Apr 22, 2025

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Small bugfix for int8 test

* Exclude backward() from code coverage reports

* Params4bit: don't try to quantize when moving to meta device

1088ec52

29 May, 2024 1 commit

FIX Prevent __getstate__ from mutating Params4bit · 2fb212bd

Benjamin Bossan authored May 29, 2024

As discussed internally, use state = self.__dict__.copy(), which is also
what the Python docs recommend.

2fb212bd

13 Mar, 2024 1 commit
- Reformat with ruff-format · 5a4263f4
  Ruff authored Feb 24, 2024
  
  5a4263f4
06 Mar, 2024 1 commit
- Deduplicate helpers & fix lint issues from #1099 (#1107) · 048a2d40
  Aarni Koskela authored Mar 06, 2024
  
  048a2d40
05 Mar, 2024 1 commit
- adding whole Linear8bitLt/Linear4bit module save/load serialization (#1099) · a1c0844b
  rdyro authored Mar 05, 2024
  
  a1c0844b
21 Feb, 2024 1 commit

add deepcopy and copy for Param4bit (#1060) · cfd6ac75

Marc Sun authored Feb 21, 2024



* fix deepcopy and copy

* add tests

* remove line

* ruff fix

* ruff

* Update tests/test_linear4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* add missing state

* ruff format

* ignore formatting commit for git blame

* Params4bit should be initialized as frozen by default

* add test for serialization round-tripping

* add comparison capability for QuantSate

* add back accidentally remove line

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

cfd6ac75

01 Feb, 2024 1 commit

Test improvements (#1001) · 2336a45c

Aarni Koskela authored Feb 01, 2024

* test_nvidia_transform: fix variable reference

`out_order` is the global parametrization list, not the test fixture argument

* Make `parametrize` use more idiomatic

* Use a more deterministic helper for `dim*` determination

* Convert NO_CUBLASLT errors into skips too

* Mark slow and benchmark tests as such (allows `-k "not benchmark"`)

2336a45c

30 Jan, 2024 1 commit

Ruff fixes (#984) · 706ec24d

Aarni Koskela authored Jan 30, 2024



* Adjust Ruff configuration

* do not autofix always
* be less strict around tests and benchmarks
* adjust ignores for now

* Ruff: autofix I and F401

* Apply ruff autofixes

* Fix RUF013 complaint

* Fix mutable default in replace_linear

* Don't use bare except

* Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo

* Fix ruff B008 (function call in arguments)

* Add ruff noqas as suitable

* Fix RUF005 (splat instead of concatenating)

* Fix B018 (useless expression)

* Add pre-commit configuration + GitHub Actions lint workflow

* Fix unused `e` in bitsandbytes/__main__.py

* fix merge conflict resolution error

* run pre-commit hook

---------
Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>

706ec24d

24 Jan, 2024 1 commit

Tests: improve CUDA support detection (#985) · f1c75741

Aarni Koskela authored Jan 24, 2024

* implicitly skip any test that implicitly uses CUDA on a non-CUDA box
* add a `requires_cuda` fixture

f1c75741

17 Jan, 2024 1 commit

Initial FSDP Support for QLoRA Finetuning (#970) · dcfb6f81

Benjamin Warner authored Jan 16, 2024



This PR adds initial FSDP support for training QLoRA models. It enables basic FSDP and CPU Offload support, with low memory training via FSDP.sync_module_states option unsupported.

This PR builds off of #840 commit 8278fca and BNB FSDP by @TimDettmers and @Titus-von-Koeller.

An example of using this PR to finetune QLoRA models with FSDP can be found in the demo repo: AnswerDotAi/fsdp_qlora.

* Minimal changes for fp32 4bit storage from BNB commit 8278fca

* Params4bit with selectable storage dtype

* possible fix for double quantizing linear weight & quant storage dtype

* minor fixes in Params4bit for peft tests

* remove redundant

* add float16

* update test

* Remove float16 quant cast as there are fp32, bf16, & fp16 quant kernels

---------
Co-authored-by: Kerem Turgutlu <keremturgutlu@gmail.com>

dcfb6f81

10 Nov, 2023 1 commit
- test comment removed · 45864262
  Ruslan Svirschevski authored Nov 10, 2023
  
  45864262
09 Nov, 2023 1 commit
- fixes for init and tests · ffd46ce1
  Ruslan Svirschevski authored Nov 10, 2023
  
  ffd46ce1
08 Nov, 2023 1 commit
- partially reverted 76b40a5c · 781fcd5b
  Ruslan Svirschevski authored Nov 08, 2023
  
  781fcd5b
02 Nov, 2023 3 commits
- save/load via state_dict now · 76b40a5c
  Ruslan Svirschevski authored Oct 25, 2023
  
  76b40a5c
- test update · 965fd5d5
  Ruslan Svirschevski authored Sep 20, 2023
  
  965fd5d5
- save/load 4bit squashed · 5bcc1ddc
  Ruslan Svirschevski authored Sep 11, 2023
  
  5bcc1ddc