- 28 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Additional 4bit CPU ops * Additional 4bit CPU ops * Implement additional device-agnostic ops and test updates * More test fixes * int8 tests passing * Fix feature flag for multi_backend
-
- 22 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit * Make test suite more device-agnostic * Additional device agnostic tests * Additional device agnosticism for tests * Add BNB_TEST_DEVICE env var to manually select device for unit tests * Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit * Make test suite more device-agnostic * Additional device agnostic tests * Additional device agnosticism for tests * Add BNB_TEST_DEVICE env var to manually select device for unit tests * Small bugfix for int8 test * Exclude backward() from code coverage reports * Params4bit: don't try to quantize when moving to meta device
-
- 27 Mar, 2025 2 commits
-
-
Matthew Douglas authored
* Testing cleanup * More test cleanup * Additional deprecations/removals. * Skip benchmark, deprecated, slow tests by default
-
Matthew Douglas authored
* Drop Python 3.8 support. * Formatting
-
- 25 Mar, 2025 1 commit
-
-
Matthew Douglas authored
* Sketch out first custom op registration * Add note * Initial int8 op registration * Cleanup some deprecated functions. * Int8 ops updates; tests * Implement 4bit quant/dequant ops * Fix nested quant * cleanup * Test improvements * Clean up and improve tests * Add higher level custom op for int8 matmul + dequant + bias * Add gemv 4bit custom op * Cleanup * Implement out kwarg overloads for custom ops * Update PyTorch minimum to 2.1 * Deprecation updates * Deprecation updates * Cleanup; rename int8_linear_dequant -> int8_scaled_mm * Bump min pytorch to 2.2 * cleanup * Test reorganization * Remove deprecated supports_igemmlt * More cleanup * Cleanup obsolete C++/CUDA code * Cleanup * Create 'default' backend for fallback op implementations; initial CPU nf4 work * Stub out for multi-platform * Fix serialization tests for torch>=2.6.0 * Add example for torch.compile e2e inference * Test update --------- Co-authored-by:Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
-
- 05 Dec, 2024 1 commit
-
-
Matthew Douglas authored
* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation * Fix unintended change * New naive mm_dequant kernel for row-major; cleanup * fix * int8 refactor: initial sparse decomp, cleanup * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup * int8: inference optimizations, some cleanup * int8: more tests passing, cleanup * int8 - more cleanup, most tests passing * int8: specify CUDA stream for int8 ops * perf: reduce overhead from getting cudaStream ptr * Mark some functions for deprecation. * int8 sparse decomp: small perf improvement * update setup.py * Update bitsandbytes/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn * int8 cleanup * Ignore ruff rule ISC001 (incompatible with formatter) * add comment * int8 more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8: rename / deprecate old fn signatures * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * type annotation * format update * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Add comment to explain division optimization * more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Type annotations, cleanup * remove unused kernels; improved type annotations * small perf optimization for single-GPU systems * small perf optimization for single-GPU systems * update docstrings * Improve docs and tests * Update docstring * Update test * add benchmarking script * test cleanup: add deprecated marker, move benchmarks out * Add int8 dequant function; misc improvements * int8 matmul fallback for inner dims not divisible by 4 * improve register usage of kInt8VectorQuant - especially for A100/H100 * disable fail-fast for package build * maxwell compat * ptxas verbose * docs update * doc update * backward fix * Bugfix sparse decomp * Int8 fix for PEFT OLoRA init * Fix test for deprecated spmm_coo * test improvement * doc update * typo * doc cleanup * docs * add inference benchmark script * Add benchmarks, doc update --------- Co-authored-by:
Aarni Koskela <akx@iki.fi>
-
- 29 Mar, 2024 1 commit
-
-
Matthew Douglas authored
-
- 13 Mar, 2024 1 commit
-
-
Ruff authored
-
- 21 Feb, 2024 1 commit
-
-
Titus authored
-
- 01 Feb, 2024 3 commits
-
-
Aarni Koskela authored
-
Aarni Koskela authored
* test_nvidia_transform: fix variable reference `out_order` is the global parametrization list, not the test fixture argument * Make `parametrize` use more idiomatic * Use a more deterministic helper for `dim*` determination * Convert NO_CUBLASLT errors into skips too * Mark slow and benchmark tests as such (allows `-k "not benchmark"`)
-
Aarni Koskela authored
`out_order` is the global parametrization list, not the test fixture argument
-
- 30 Jan, 2024 1 commit
-
-
Aarni Koskela authored
* Adjust Ruff configuration * do not autofix always * be less strict around tests and benchmarks * adjust ignores for now * Ruff: autofix I and F401 * Apply ruff autofixes * Fix RUF013 complaint * Fix mutable default in replace_linear * Don't use bare except * Wrap bitsandbytes.__main__ entrypoint in function; fix "sensible" typo * Fix ruff B008 (function call in arguments) * Add ruff noqas as suitable * Fix RUF005 (splat instead of concatenating) * Fix B018 (useless expression) * Add pre-commit configuration + GitHub Actions lint workflow * Fix unused `e` in bitsandbytes/__main__.py * fix merge conflict resolution error * run pre-commit hook --------- Co-authored-by:Titus <9048635+Titus-von-Koeller@users.noreply.github.com>
-
- 24 Jan, 2024 1 commit
-
-
Aarni Koskela authored
* implicitly skip any test that implicitly uses CUDA on a non-CUDA box * add a `requires_cuda` fixture
-
- 17 Jan, 2024 1 commit
-
-
Benjamin Warner authored
This PR adds initial FSDP support for training QLoRA models. It enables basic FSDP and CPU Offload support, with low memory training via FSDP.sync_module_states option unsupported. This PR builds off of #840 commit 8278fca and BNB FSDP by @TimDettmers and @Titus-von-Koeller. An example of using this PR to finetune QLoRA models with FSDP can be found in the demo repo: AnswerDotAi/fsdp_qlora. * Minimal changes for fp32 4bit storage from BNB commit 8278fca * Params4bit with selectable storage dtype * possible fix for double quantizing linear weight & quant storage dtype * minor fixes in Params4bit for peft tests * remove redundant * add float16 * update test * Remove float16 quant cast as there are fp32, bf16, & fp16 quant kernels --------- Co-authored-by:Kerem Turgutlu <keremturgutlu@gmail.com>
-
- 08 Jan, 2024 1 commit
-
-
Tim Dettmers authored
-
- 02 Nov, 2023 2 commits
-
-
Ruslan Svirschevski authored
-
Ruslan Svirschevski authored
-
- 04 Aug, 2023 1 commit
-
-
Tim Dettmers authored
-
- 19 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 12 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 11 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 10 Jul, 2023 3 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 09 Jul, 2023 3 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 08 Jul, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 05 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 04 Jul, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 31 May, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 24 May, 2023 1 commit
-
-
Tim Dettmers authored
-
- 06 May, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 02 May, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-