Commits · f0735f95174136a71a097ce54942c1e9a9d89a3a · OpenDAS / bitsandbytes

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

20 Sep, 2024 2 commits

Change 8bit optimizer blocksize 2048->256; additional bf16 support (#1365) · aa57bd89
Matthew Douglas authored Sep 20, 2024
```
* Change 8bit optimizer blocksize 2048->256; additional bf16 support
* Update tolerances for 8bit optimizer tests
```
aa57bd89

Add AdEMAMix optimizer (#1360) · d9645465

Matthew Douglas authored Sep 20, 2024

* Add AdEMAMix optimizer

* Add PagedAdEMAMix32bit, AdEMAMix32bit

* Add PagedAdEMAMix32bit, AdEMAMix32bit

* AdEMAMix: add support for alpha/beta3 scheduling

* Update paged AdEMAMix

d9645465

14 Aug, 2024 1 commit
- Bugfix: Load correct nocublaslt library variant when BNB_CUDA_VERSION override is set (#1318) · a4875fc0
  Matthew Douglas authored Aug 14, 2024
  
  a4875fc0
06 Aug, 2024 1 commit

Embedding4bit and Embedding8bit implementation (#1292) · 6d714a5c

Vladimir Malinovskii authored Aug 06, 2024



* Embedding4bit and Embedding8bit implementation

* lint

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* saving -> Saving

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

6d714a5c

15 Jul, 2024 1 commit

Fixed tests for cpu only platforms (#1259) · 39b42e74

Vladimir Malinovskii authored Jul 15, 2024

* fixed test_4bit_warnings on cpu-only platforms

* fixed linear8bit-based tests for cpu only platforms

39b42e74

30 May, 2024 1 commit

FIX Make Int8Params deepcopy-able · ed99b3c1

Benjamin Bossan authored May 30, 2024

This requires to implement the __deepcopy__ method in Int8Params.
Moreover, there was an issue in the Linear8BitLT constructor that would
assign instance attributes to the class, which is now fixed.

Please review carefully that this does not impact existing code.

Tests that I ran:

- pytest tests/test_linear8bitlt.py
- in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py
- in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py
- in transformers: RUN_SLOW=1 python -m pytest tests/quantization/bnb -x

ed99b3c1

29 May, 2024 1 commit

FIX Prevent __getstate__ from mutating Params4bit · 2fb212bd

Benjamin Bossan authored May 29, 2024

As discussed internally, use state = self.__dict__.copy(), which is also
what the Python docs recommend.

2fb212bd

02 Apr, 2024 1 commit
- Tests: improve memory usage (#1147) · bed0860b
  Matthew Douglas authored Apr 02, 2024
  
  bed0860b
29 Mar, 2024 1 commit
- Fix 4bit quantization with blocksize=4096 · c17fb8eb
  Matthew Douglas authored Mar 29, 2024
  
  c17fb8eb
13 Mar, 2024 2 commits
- Reformat with ruff-format · 5a4263f4
  Ruff authored Feb 24, 2024
  
  5a4263f4
- Rework CUDA setup and diagnostics · e2db55ed
  Aarni Koskela authored Feb 06, 2024
  
  e2db55ed
11 Mar, 2024 2 commits
- Add additional guard for "no NVIDIA driver" · 2416dd36
  Aarni Koskela authored Mar 08, 2024
  
  2416dd36
- Soft-require `transformers` in tests · 62249b4a
  Aarni Koskela authored Mar 08, 2024
  
  62249b4a
06 Mar, 2024 1 commit
- Deduplicate helpers & fix lint issues from #1099 (#1107) · 048a2d40
  Aarni Koskela authored Mar 06, 2024
  
  048a2d40
05 Mar, 2024 1 commit
- adding whole Linear8bitLt/Linear4bit module save/load serialization (#1099) · a1c0844b
  rdyro authored Mar 05, 2024
  
  a1c0844b
21 Feb, 2024 3 commits

add deepcopy and copy for Param4bit (#1060) · cfd6ac75

Marc Sun authored Feb 21, 2024



* fix deepcopy and copy

* add tests

* remove line

* ruff fix

* ruff

* Update tests/test_linear4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* add missing state

* ruff format

* ignore formatting commit for git blame

* Params4bit should be initialized as frozen by default

* add test for serialization round-tripping

* add comparison capability for QuantSate

* add back accidentally remove line

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

cfd6ac75

tests: fix all_close to respect max 2 positional args (#1074) · d11b5068
Titus authored Feb 21, 2024

d11b5068
tests/helpers.py: fix py38 vers incompatibility from other PR · 0bf71989
Titus authored Feb 21, 2024

0bf71989

05 Feb, 2024 1 commit
- Enable crate-ci/typos lint; fix typos (#1005) · 8c507d92
  Aarni Koskela authored Feb 05, 2024
```
Co-authored-by: Titus von Koeller <titus@vonkoeller.com>

fix erroneous correction
```
  8c507d92
01 Feb, 2024 3 commits

Enable line-ending and other hygiene lints (#1006) · 6974920b
Aarni Koskela authored Feb 01, 2024

6974920b

Test improvements (#1001) · 2336a45c

Aarni Koskela authored Feb 01, 2024

* test_nvidia_transform: fix variable reference

`out_order` is the global parametrization list, not the test fixture argument

* Make `parametrize` use more idiomatic

* Use a more deterministic helper for `dim*` determination

* Convert NO_CUBLASLT errors into skips too

* Mark slow and benchmark tests as such (allows `-k "not benchmark"`)

2336a45c

test_nvidia_transform: fix variable reference (#1000) · 1a0dc5c3
Aarni Koskela authored Feb 01, 2024
```
`out_order` is the global parametrization list, not the test fixture argument
```
1a0dc5c3

30 Jan, 2024 1 commit

Ruff fixes (#984) · 706ec24d

Aarni Koskela authored Jan 30, 2024



* Adjust Ruff configuration

* do not autofix always
* be less strict around tests and benchmarks
* adjust ignores for now

* Ruff: autofix I and F401

* Apply ruff autofixes

* Fix RUF013 complaint

* Fix mutable default in replace_linear

* Don't use bare except

* Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo

* Fix ruff B008 (function call in arguments)

* Add ruff noqas as suitable

* Fix RUF005 (splat instead of concatenating)

* Fix B018 (useless expression)

* Add pre-commit configuration + GitHub Actions lint workflow

* Fix unused `e` in bitsandbytes/__main__.py

* fix merge conflict resolution error

* run pre-commit hook

---------
Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>

706ec24d

24 Jan, 2024 1 commit

Tests: improve CUDA support detection (#985) · f1c75741

Aarni Koskela authored Jan 24, 2024

* implicitly skip any test that implicitly uses CUDA on a non-CUDA box
* add a `requires_cuda` fixture

f1c75741

17 Jan, 2024 1 commit

Initial FSDP Support for QLoRA Finetuning (#970) · dcfb6f81

Benjamin Warner authored Jan 16, 2024



This PR adds initial FSDP support for training QLoRA models. It enables basic FSDP and CPU Offload support, with low memory training via FSDP.sync_module_states option unsupported.

This PR builds off of #840 commit 8278fca and BNB FSDP by @TimDettmers and @Titus-von-Koeller.

An example of using this PR to finetune QLoRA models with FSDP can be found in the demo repo: AnswerDotAi/fsdp_qlora.

* Minimal changes for fp32 4bit storage from BNB commit 8278fca

* Params4bit with selectable storage dtype

* possible fix for double quantizing linear weight & quant storage dtype

* minor fixes in Params4bit for peft tests

* remove redundant

* add float16

* update test

* Remove float16 quant cast as there are fp32, bf16, & fp16 quant kernels

---------
Co-authored-by: Kerem Turgutlu <keremturgutlu@gmail.com>

dcfb6f81

08 Jan, 2024 1 commit
- Fixed bnb input in setup.py. Bumped version for release. · 4870580f
  Tim Dettmers authored Jan 07, 2024
  
  4870580f
03 Dec, 2023 1 commit
- chore: update dev setup · 2c605d03
  Titus von Koeller authored Dec 03, 2023
  
  2c605d03
10 Nov, 2023 1 commit
- test comment removed · 45864262
  Ruslan Svirschevski authored Nov 10, 2023
  
  45864262
09 Nov, 2023 1 commit
- fixes for init and tests · ffd46ce1
  Ruslan Svirschevski authored Nov 10, 2023
  
  ffd46ce1
08 Nov, 2023 1 commit
- partially reverted 76b40a5c · 781fcd5b
  Ruslan Svirschevski authored Nov 08, 2023
  
  781fcd5b
02 Nov, 2023 5 commits
- save/load via state_dict now · 76b40a5c
  Ruslan Svirschevski authored Oct 25, 2023
  
  76b40a5c
- test update · 965fd5d5
  Ruslan Svirschevski authored Sep 20, 2023
  
  965fd5d5
- reverted fn signatures in functional() · 4c11d6dc
  Ruslan Svirschevski authored Sep 20, 2023
  
  4c11d6dc
- save/load 4bit squashed · 5bcc1ddc
  Ruslan Svirschevski authored Sep 11, 2023
  
  5bcc1ddc
- use QuantState class for quant_state · 61a4a20d
  Ruslan Svirschevski authored Sep 11, 2023
  
  61a4a20d
04 Aug, 2023 1 commit
- Fixed two bugs in dynamic data type creation. · 3c9aca91
  Tim Dettmers authored Aug 03, 2023
  
  3c9aca91
22 Jul, 2023 1 commit
- Added better default compute_dtype handling for Linear4bit layers. · 412fd0e7
  Tim Dettmers authored Jul 22, 2023
  
  412fd0e7
19 Jul, 2023 1 commit
- Increased occupancy. · c82f51c0
  Tim Dettmers authored Jul 19, 2023
  
  c82f51c0
17 Jul, 2023 1 commit
- Fix typo in test_optim.py · 87816e4e
  Ikko Eltociear Ashimine authored Jul 18, 2023
```
paramters -> parameters
```
  87816e4e