Commits · 1088ec527038abc826038a74a4e2ac38e8f1b778 · OpenDAS / bitsandbytes

22 Apr, 2025 1 commit

Updates for device agnosticism (#1601) · 1088ec52

Matthew Douglas authored Apr 22, 2025

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit

* Make test suite more device-agnostic

* Additional device agnostic tests

* Additional device agnosticism for tests

* Add BNB_TEST_DEVICE env var to manually select device for unit tests

* Small bugfix for int8 test

* Exclude backward() from code coverage reports

* Params4bit: don't try to quantize when moving to meta device

1088ec52

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

06 Aug, 2024 1 commit

Embedding4bit and Embedding8bit implementation (#1292) · 6d714a5c

Vladimir Malinovskii authored Aug 06, 2024



* Embedding4bit and Embedding8bit implementation

* lint

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* saving -> Saving

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

6d714a5c

15 Jul, 2024 1 commit

Fixed tests for cpu only platforms (#1259) · 39b42e74

Vladimir Malinovskii authored Jul 15, 2024

* fixed test_4bit_warnings on cpu-only platforms

* fixed linear8bit-based tests for cpu only platforms

39b42e74

13 Mar, 2024 1 commit
- Reformat with ruff-format · 5a4263f4
  Ruff authored Feb 24, 2024
  
  5a4263f4
21 Feb, 2024 1 commit
- tests: fix all_close to respect max 2 positional args (#1074) · d11b5068
  Titus authored Feb 21, 2024
  
  d11b5068
05 Feb, 2024 1 commit
- Enable crate-ci/typos lint; fix typos (#1005) · 8c507d92
  Aarni Koskela authored Feb 05, 2024
```
Co-authored-by: Titus von Koeller <titus@vonkoeller.com>

fix erroneous correction
```
  8c507d92
01 Feb, 2024 2 commits

Enable line-ending and other hygiene lints (#1006) · 6974920b
Aarni Koskela authored Feb 01, 2024

6974920b

Test improvements (#1001) · 2336a45c

Aarni Koskela authored Feb 01, 2024

* test_nvidia_transform: fix variable reference

`out_order` is the global parametrization list, not the test fixture argument

* Make `parametrize` use more idiomatic

* Use a more deterministic helper for `dim*` determination

* Convert NO_CUBLASLT errors into skips too

* Mark slow and benchmark tests as such (allows `-k "not benchmark"`)

2336a45c

30 Jan, 2024 1 commit

Ruff fixes (#984) · 706ec24d

Aarni Koskela authored Jan 30, 2024



* Adjust Ruff configuration

* do not autofix always
* be less strict around tests and benchmarks
* adjust ignores for now

* Ruff: autofix I and F401

* Apply ruff autofixes

* Fix RUF013 complaint

* Fix mutable default in replace_linear

* Don't use bare except

* Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo

* Fix ruff B008 (function call in arguments)

* Add ruff noqas as suitable

* Fix RUF005 (splat instead of concatenating)

* Fix B018 (useless expression)

* Add pre-commit configuration + GitHub Actions lint workflow

* Fix unused `e` in bitsandbytes/__main__.py

* fix merge conflict resolution error

* run pre-commit hook

---------
Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>

706ec24d

24 Jan, 2024 1 commit

Tests: improve CUDA support detection (#985) · f1c75741

Aarni Koskela authored Jan 24, 2024

* implicitly skip any test that implicitly uses CUDA on a non-CUDA box
* add a `requires_cuda` fixture

f1c75741

08 Jan, 2024 1 commit
- Fixed bnb input in setup.py. Bumped version for release. · 4870580f
  Tim Dettmers authored Jan 07, 2024
  
  4870580f
22 Jul, 2023 1 commit
- Added better default compute_dtype handling for Linear4bit layers. · 412fd0e7
  Tim Dettmers authored Jul 22, 2023
  
  412fd0e7
10 Jul, 2023 1 commit
- Added test for Param4bit.to() and fixed double quant behavior. · cef519c8
  Tim Dettmers authored Jul 09, 2023
  
  cef519c8
07 May, 2023 1 commit
- Fixed gradient accumulation test. · 4bd11518
  Tim Dettmers authored May 07, 2023
  
  4bd11518
12 Apr, 2023 1 commit
- Refactored simulated fp8 modules into research.nn. · dd562c24
  Tim Dettmers authored Apr 12, 2023
  
  dd562c24
04 Apr, 2023 1 commit
- Fixed ParamsIn4 init; fixed PyTorch 2.0 test failure. · 1ccb7bde
  Tim Dettmers authored Apr 03, 2023
  
  1ccb7bde
03 Apr, 2023 1 commit
- Refactor FP4 into 4Bit and integrate NF4 data type. · 4ea489d3
  Tim Dettmers authored Apr 03, 2023
  
  4ea489d3
01 Apr, 2023 1 commit
- Added 8-bit compression to quantization statistics. · 51a21df7
  Tim Dettmers authored Apr 01, 2023
  
  51a21df7
14 Feb, 2023 1 commit
- Fixed LinearFP8 and added tests. · 2dfa3ce1
  Tim Dettmers authored Feb 13, 2023
  
  2dfa3ce1
05 Feb, 2023 2 commits
- Added backprop test for Linear8bitLt and LinearFP4. · 7f0773ae
  Tim Dettmers authored Feb 05, 2023
  
  7f0773ae
- Added bias test for LinearFP4 and basic test. · c0c352b3
  Tim Dettmers authored Feb 05, 2023
  
  c0c352b3
02 Feb, 2023 1 commit
- Added Int8 matmul support for all GPUs. Full backward support. · de535889
  Tim Dettmers authored Feb 01, 2023
  
  de535889
27 Oct, 2022 1 commit

Simplify statements into equivalent, modern variants · 0b078403

Tom Aarsen authored Oct 27, 2022

via pyupgrade --py37-plus. The changes e.g. are subclassing from object, calling super() with super(ThisClass, self), or old-style syntax formatting.

0b078403

24 Oct, 2022 1 commit
- Isolated CUDASetup logging; all tests green. · df86625a
  Tim Dettmers authored Oct 24, 2022
  
  df86625a
20 Sep, 2022 2 commits
- set threshold · 292a4787
  Tim Dettmers authored Sep 20, 2022
  
  292a4787
- review · a07825ac
  justheuristic authored Sep 20, 2022
  
  a07825ac
17 Sep, 2022 12 commits
- cast device · cff3a715
  justheuristic authored Sep 18, 2022
  
  cff3a715
- cast device · 32a9a88f
  justheuristic authored Sep 18, 2022
  
  32a9a88f
- cast device · 01b4c6a0
  justheuristic authored Sep 18, 2022
  
  01b4c6a0
- cast device · e4086a27
  justheuristic authored Sep 18, 2022
  
  e4086a27
- cast device · 725cc729
  justheuristic authored Sep 18, 2022
  
  725cc729
- cast before allclose · 28a9313d
  justheuristic authored Sep 18, 2022
  
  28a9313d
- cast before allclose · 95dafc64
  justheuristic authored Sep 18, 2022
  
  95dafc64
- debug · 37f805bb
  justheuristic authored Sep 18, 2022
  
  37f805bb
- pre-cast · 6a826c41
  justheuristic authored Sep 18, 2022
  
  6a826c41
- debug · d9b87898
  justheuristic authored Sep 18, 2022
  
  d9b87898
- run backward · 2cd047e3
  justheuristic authored Sep 18, 2022
  
  2cd047e3
- add memory efficient backward · 591f6039
  justheuristic authored Sep 18, 2022
  
  591f6039
17 Aug, 2022 1 commit
- Fixed bug in Linear8bitLt, when the bias is None. · 9d60b3c5
  Tim Dettmers authored Aug 17, 2022
  
  9d60b3c5