- 05 Dec, 2024 1 commit
-
-
Matthew Douglas authored
* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation * Fix unintended change * New naive mm_dequant kernel for row-major; cleanup * fix * int8 refactor: initial sparse decomp, cleanup * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup * int8: inference optimizations, some cleanup * int8: more tests passing, cleanup * int8 - more cleanup, most tests passing * int8: specify CUDA stream for int8 ops * perf: reduce overhead from getting cudaStream ptr * Mark some functions for deprecation. * int8 sparse decomp: small perf improvement * update setup.py * Update bitsandbytes/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn * int8 cleanup * Ignore ruff rule ISC001 (incompatible with formatter) * add comment * int8 more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8: rename / deprecate old fn signatures * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * type annotation * format update * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Add comment to explain division optimization * more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Type annotations, cleanup * remove unused kernels; improved type annotations * small perf optimization for single-GPU systems * small perf optimization for single-GPU systems * update docstrings * Improve docs and tests * Update docstring * Update test * add benchmarking script * test cleanup: add deprecated marker, move benchmarks out * Add int8 dequant function; misc improvements * int8 matmul fallback for inner dims not divisible by 4 * improve register usage of kInt8VectorQuant - especially for A100/H100 * disable fail-fast for package build * maxwell compat * ptxas verbose * docs update * doc update * backward fix * Bugfix sparse decomp * Int8 fix for PEFT OLoRA init * Fix test for deprecated spmm_coo * test improvement * doc update * typo * doc cleanup * docs * add inference benchmark script * Add benchmarks, doc update --------- Co-authored-by:
Aarni Koskela <akx@iki.fi>
-
- 20 Sep, 2024 2 commits
-
-
Matthew Douglas authored
* Change 8bit optimizer blocksize 2048->256; additional bf16 support * Update tolerances for 8bit optimizer tests
-
Matthew Douglas authored
* Add AdEMAMix optimizer * Add PagedAdEMAMix32bit, AdEMAMix32bit * Add PagedAdEMAMix32bit, AdEMAMix32bit * AdEMAMix: add support for alpha/beta3 scheduling * Update paged AdEMAMix
-
- 14 Aug, 2024 1 commit
-
-
Matthew Douglas authored
-
- 06 Aug, 2024 1 commit
-
-
Vladimir Malinovskii authored
* Embedding4bit and Embedding8bit implementation * lint * Update bitsandbytes/nn/modules.py Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update bitsandbytes/nn/modules.py Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update bitsandbytes/nn/modules.py Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * saving -> Saving --------- Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
-
- 15 Jul, 2024 1 commit
-
-
Vladimir Malinovskii authored
* fixed test_4bit_warnings on cpu-only platforms * fixed linear8bit-based tests for cpu only platforms
-
- 30 May, 2024 1 commit
-
-
Benjamin Bossan authored
This requires to implement the __deepcopy__ method in Int8Params. Moreover, there was an issue in the Linear8BitLT constructor that would assign instance attributes to the class, which is now fixed. Please review carefully that this does not impact existing code. Tests that I ran: - pytest tests/test_linear8bitlt.py - in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py - in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py - in transformers: RUN_SLOW=1 python -m pytest tests/quantization/bnb -x
-
- 29 May, 2024 1 commit
-
-
Benjamin Bossan authored
As discussed internally, use state = self.__dict__.copy(), which is also what the Python docs recommend.
-
- 02 Apr, 2024 1 commit
-
-
Matthew Douglas authored
-
- 29 Mar, 2024 1 commit
-
-
Matthew Douglas authored
-
- 13 Mar, 2024 2 commits
-
-
Ruff authored
-
Aarni Koskela authored
-
- 11 Mar, 2024 2 commits
-
-
Aarni Koskela authored
-
Aarni Koskela authored
-
- 06 Mar, 2024 1 commit
-
-
Aarni Koskela authored
-
- 05 Mar, 2024 1 commit
-
-
rdyro authored
-
- 21 Feb, 2024 3 commits
-
-
Marc Sun authored
* fix deepcopy and copy * add tests * remove line * ruff fix * ruff * Update tests/test_linear4bit.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * add missing state * ruff format * ignore formatting commit for git blame * Params4bit should be initialized as frozen by default * add test for serialization round-tripping * add comparison capability for QuantSate * add back accidentally remove line --------- Co-authored-by:
Aarni Koskela <akx@iki.fi> Co-authored-by:
Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
-
Titus authored
-
Titus authored
-
- 05 Feb, 2024 1 commit
-
-
Aarni Koskela authored
Co-authored-by:Titus von Koeller <titus@vonkoeller.com> fix erroneous correction
-
- 01 Feb, 2024 3 commits
-
-
Aarni Koskela authored
-
Aarni Koskela authored
* test_nvidia_transform: fix variable reference `out_order` is the global parametrization list, not the test fixture argument * Make `parametrize` use more idiomatic * Use a more deterministic helper for `dim*` determination * Convert NO_CUBLASLT errors into skips too * Mark slow and benchmark tests as such (allows `-k "not benchmark"`)
-
Aarni Koskela authored
`out_order` is the global parametrization list, not the test fixture argument
-
- 30 Jan, 2024 1 commit
-
-
Aarni Koskela authored
* Adjust Ruff configuration * do not autofix always * be less strict around tests and benchmarks * adjust ignores for now * Ruff: autofix I and F401 * Apply ruff autofixes * Fix RUF013 complaint * Fix mutable default in replace_linear * Don't use bare except * Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo * Fix ruff B008 (function call in arguments) * Add ruff noqas as suitable * Fix RUF005 (splat instead of concatenating) * Fix B018 (useless expression) * Add pre-commit configuration + GitHub Actions lint workflow * Fix unused `e` in bitsandbytes/__main__.py * fix merge conflict resolution error * run pre-commit hook --------- Co-authored-by:Titus <9048635+Titus-von-Koeller@users.noreply.github.com>
-
- 24 Jan, 2024 1 commit
-
-
Aarni Koskela authored
* implicitly skip any test that implicitly uses CUDA on a non-CUDA box * add a `requires_cuda` fixture
-
- 17 Jan, 2024 1 commit
-
-
Benjamin Warner authored
This PR adds initial FSDP support for training QLoRA models. It enables basic FSDP and CPU Offload support, with low memory training via FSDP.sync_module_states option unsupported. This PR builds off of #840 commit 8278fca and BNB FSDP by @TimDettmers and @Titus-von-Koeller. An example of using this PR to finetune QLoRA models with FSDP can be found in the demo repo: AnswerDotAi/fsdp_qlora. * Minimal changes for fp32 4bit storage from BNB commit 8278fca * Params4bit with selectable storage dtype * possible fix for double quantizing linear weight & quant storage dtype * minor fixes in Params4bit for peft tests * remove redundant * add float16 * update test * Remove float16 quant cast as there are fp32, bf16, & fp16 quant kernels --------- Co-authored-by:Kerem Turgutlu <keremturgutlu@gmail.com>
-
- 08 Jan, 2024 1 commit
-
-
Tim Dettmers authored
-
- 03 Dec, 2023 1 commit
-
-
Titus von Koeller authored
-
- 10 Nov, 2023 1 commit
-
-
Ruslan Svirschevski authored
-
- 09 Nov, 2023 1 commit
-
-
Ruslan Svirschevski authored
-
- 08 Nov, 2023 1 commit
-
-
Ruslan Svirschevski authored
-
- 02 Nov, 2023 5 commits
-
-
Ruslan Svirschevski authored
-
Ruslan Svirschevski authored
-
Ruslan Svirschevski authored
-
Ruslan Svirschevski authored
-
Ruslan Svirschevski authored
-
- 04 Aug, 2023 1 commit
-
-
Tim Dettmers authored
-
- 22 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 19 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 17 Jul, 2023 1 commit
-
-
Ikko Eltociear Ashimine authored
paramters -> parameters
-