Commits · 31034b4fcd898e88c9514b6cdc7a5675fcb8b6fe · OpenDAS / bitsandbytes

18 Jun, 2025 1 commit
- Update unit tests for HPU (#1682) · 31034b4f
  Chetan Kumar Verma authored Jun 18, 2025
  
  31034b4f
17 Jun, 2025 1 commit

CI: Setup HPU nightly tests (#1681) · 29564ad6

Matthew Douglas authored Jun 17, 2025

* Setup XPU CI

* CI: expand XPU matrix

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* skip some fp4 tests on hpu

* skip some fp4 tests on hpu

* skip gemv tests on hpu

* test

* Additional test patches for HPU

* HPU test update

* HPU test update

* HPU test update

* HPU test update

* Format

29564ad6

06 Jun, 2025 1 commit
- Fix Linear4bit warnings/test for compute dtype · e9f3605f
  Matthew Douglas authored Jun 06, 2025
  
  e9f3605f
04 Jun, 2025 1 commit

Deprecation cleanup (#1669) · 849d9449

Matthew Douglas authored Jun 04, 2025

* Deprecation cleanup: remove histogram_scatter_add_2d

* Deprecation cleanup: vectorwise_mm_dequant

* Deprecation cleanup: vectorwise_quant

* Remove unused test

* Optimizer test cleanup

* Deprecations: remove estimate_quantiles, create_quantile_map

* Move deprecated test

849d9449

03 Jun, 2025 1 commit
- Tests: don't require grad on weights for test_kbit_backprop · 55ebaac7
  Matthew Douglas authored Jun 03, 2025
  
  55ebaac7
28 May, 2025 1 commit

Enable CPU/XPU native and ipex path (#1628) · aaa71d7e

jiqing-feng authored May 29, 2025



* enable ipex
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu 8bit quantization
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix int8 and nf4 cpu inference
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add cpu fp4 and rem
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex op
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 name
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix dequantize nf4 ipex
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bitfp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix quantize blockwise output shape
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix quant_storage bf16 and gemv cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lib
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip xpu dequantize blockwise op check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bit
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip not used function teests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix matmul8bit fp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check ipex before MatMul8bitFp
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update ipex install guide
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update install guide
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix error lof
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* move torch op to default
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert ipex check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix code tabledevice
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix code table device
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu ops
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

aaa71d7e

19 May, 2025 1 commit

CI runner updates (#1643) · cdcae8d3

Matthew Douglas authored May 19, 2025

* Test g5g runner

* Switch L4 to L40S runner; swap GitHub Linux T4 runner for AWS g4dn

* Run tests on last 2 pytorch stable releases

* Run tests on last 2 pytorch stable releases

cdcae8d3

28 Apr, 2025 1 commit

Add simple op implementations for CPU (#1602) · 10b9d4cd

Matthew Douglas authored Apr 28, 2025

* Additional 4bit CPU ops

* Additional 4bit CPU ops

* Implement additional device-agnostic ops and test updates

* More test fixes

* int8 tests passing

* Fix feature flag for multi_backend

10b9d4cd

22 Apr, 2025 1 commit

Updates for device agnosticism (#1601) · 1088ec52

Matthew Douglas authored Apr 22, 2025

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Small bugfix for int8 test

* Exclude backward() from code coverage reports

* Params4bit: don't try to quantize when moving to meta device

1088ec52

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

06 Aug, 2024 1 commit

Embedding4bit and Embedding8bit implementation (#1292) · 6d714a5c

Vladimir Malinovskii authored Aug 06, 2024



* Embedding4bit and Embedding8bit implementation

* lint

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* saving -> Saving

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

6d714a5c

15 Jul, 2024 1 commit

Fixed tests for cpu only platforms (#1259) · 39b42e74

Vladimir Malinovskii authored Jul 15, 2024

* fixed test_4bit_warnings on cpu-only platforms

* fixed linear8bit-based tests for cpu only platforms

39b42e74

13 Mar, 2024 1 commit
- Reformat with ruff-format · 5a4263f4
  Ruff authored Feb 24, 2024
  
  5a4263f4
21 Feb, 2024 1 commit
- tests: fix all_close to respect max 2 positional args (#1074) · d11b5068
  Titus authored Feb 21, 2024
  
  d11b5068
05 Feb, 2024 1 commit
- Enable crate-ci/typos lint; fix typos (#1005) · 8c507d92
  Aarni Koskela authored Feb 05, 2024
```
Co-authored-by: Titus von Koeller <titus@vonkoeller.com>

fix erroneous correction
```
  8c507d92
01 Feb, 2024 2 commits

Enable line-ending and other hygiene lints (#1006) · 6974920b
Aarni Koskela authored Feb 01, 2024

6974920b

Test improvements (#1001) · 2336a45c

Aarni Koskela authored Feb 01, 2024

* test_nvidia_transform: fix variable reference

`out_order` is the global parametrization list, not the test fixture argument

* Make `parametrize` use more idiomatic

* Use a more deterministic helper for `dim*` determination

* Convert NO_CUBLASLT errors into skips too

* Mark slow and benchmark tests as such (allows `-k "not benchmark"`)

2336a45c

30 Jan, 2024 1 commit

Ruff fixes (#984) · 706ec24d

Aarni Koskela authored Jan 30, 2024



* Adjust Ruff configuration

* do not autofix always
* be less strict around tests and benchmarks
* adjust ignores for now

* Ruff: autofix I and F401

* Apply ruff autofixes

* Fix RUF013 complaint

* Fix mutable default in replace_linear

* Don't use bare except

* Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo

* Fix ruff B008 (function call in arguments)

* Add ruff noqas as suitable

* Fix RUF005 (splat instead of concatenating)

* Fix B018 (useless expression)

* Add pre-commit configuration + GitHub Actions lint workflow

* Fix unused `e` in bitsandbytes/__main__.py

* fix merge conflict resolution error

* run pre-commit hook

---------
Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>

706ec24d

24 Jan, 2024 1 commit

Tests: improve CUDA support detection (#985) · f1c75741

Aarni Koskela authored Jan 24, 2024

* implicitly skip any test that implicitly uses CUDA on a non-CUDA box
* add a `requires_cuda` fixture

f1c75741

08 Jan, 2024 1 commit
- Fixed bnb input in setup.py. Bumped version for release. · 4870580f
  Tim Dettmers authored Jan 07, 2024
  
  4870580f
22 Jul, 2023 1 commit
- Added better default compute_dtype handling for Linear4bit layers. · 412fd0e7
  Tim Dettmers authored Jul 22, 2023
  
  412fd0e7
10 Jul, 2023 1 commit
- Added test for Param4bit.to() and fixed double quant behavior. · cef519c8
  Tim Dettmers authored Jul 09, 2023
  
  cef519c8
07 May, 2023 1 commit
- Fixed gradient accumulation test. · 4bd11518
  Tim Dettmers authored May 07, 2023
  
  4bd11518
12 Apr, 2023 1 commit
- Refactored simulated fp8 modules into research.nn. · dd562c24
  Tim Dettmers authored Apr 12, 2023
  
  dd562c24
04 Apr, 2023 1 commit
- Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure. · 1ccb7bde
  Tim Dettmers authored Apr 03, 2023
  
  1ccb7bde
03 Apr, 2023 1 commit
- Refactor FP4 into 4Bit and integrate NF4 data type. · 4ea489d3
  Tim Dettmers authored Apr 03, 2023
  
  4ea489d3
01 Apr, 2023 1 commit
- Added 8-bit compression to quantization statistics. · 51a21df7
  Tim Dettmers authored Apr 01, 2023
  
  51a21df7
14 Feb, 2023 1 commit
- Fixed LinearFP8 and added tests. · 2dfa3ce1
  Tim Dettmers authored Feb 13, 2023
  
  2dfa3ce1
05 Feb, 2023 2 commits
- Added backprop test for Linear8bitLt and LinearFP4. · 7f0773ae
  Tim Dettmers authored Feb 05, 2023
  
  7f0773ae
- Added bias test for LinearFP4 and basic test. · c0c352b3
  Tim Dettmers authored Feb 05, 2023
  
  c0c352b3
02 Feb, 2023 1 commit
- Added Int8 matmul support for all GPUs. Full backward support. · de535889
  Tim Dettmers authored Feb 01, 2023
  
  de535889
27 Oct, 2022 1 commit

Simplify statements into equivalent, modern variants · 0b078403

Tom Aarsen authored Oct 27, 2022

via pyupgrade --py37-plus. The changes e.g. are subclassing from object, calling super() with super(ThisClass, self), or old-style syntax formatting.

0b078403

24 Oct, 2022 1 commit
- Isolated CUDASetup logging; all tests green. · df86625a
  Tim Dettmers authored Oct 24, 2022
  
  df86625a
20 Sep, 2022 2 commits
- set threshold · 292a4787
  Tim Dettmers authored Sep 20, 2022
  
  292a4787
- review · a07825ac
  justheuristic authored Sep 20, 2022
  
  a07825ac
17 Sep, 2022 5 commits
- cast device · cff3a715
  justheuristic authored Sep 18, 2022
  
  cff3a715
- cast device · 32a9a88f
  justheuristic authored Sep 18, 2022
  
  32a9a88f
- cast device · 01b4c6a0
  justheuristic authored Sep 18, 2022
  
  01b4c6a0
- cast device · e4086a27
  justheuristic authored Sep 18, 2022
  
  e4086a27
- cast device · 725cc729
  justheuristic authored Sep 18, 2022
  
  725cc729