- 02 Aug, 2025 1 commit
-
-
Mohamed Hisham authored
-
- 14 Jul, 2025 6 commits
-
-
Matthew Douglas authored
-
Egor Krivov authored
-
Egor Krivov authored
-
Egor Krivov authored
-
Egor Krivov authored
-
Egor Krivov authored
-
- 20 Jun, 2025 1 commit
-
-
pnunna93 authored
* Port ROCm changes from multi-backend-refactor branch * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update ops.py * Update functional.py * Update functional.py * Update functional.py * Update test_ops.py * Update test_functional.py * Update test_ops.py * Update test_functional.py * Update test_functional.py * Update functional.py * Update functional.py * Update ops.py * Update ops.py * Update test_functional.py * Update test_functional.py * Update cextension.py * Update cuda_specs.py * Update cuda_specs.py * Update test_functional.py * Update test_linear4bit.py * Update test_cuda_setup_evaluator.py * Update test_functional.py * Update modules.py * Update modules.py * Update ops.py * Update test_linear4bit.py * Update ops.py * Update ops.py * Update test_linear4bit.py * Update test_linear4bit.py * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Create build-rocm.sh * Update cuda_specs.py * Fix trailing whitespace * Remove conflicts.diff * update for hipblasVersionMajor >=3 * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update main.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Update test_linear4bit.py * Lint * Lint * Update helpers.py * Update test_functional.py * Update test_linear4bit.py * Update test_ops.py * Lint * Update pythonInterface.cpp * lint fix * lint * Update pythonInterface.cpp * revert permissions change * Fix indentation * Update kernels_hip.cuh * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update kernels_hip.cuh * Update kernels.hip * Update kernels.hip * Update ops.hip * Update ops_hip.cuh * Update ops.hip * Update CMakeLists.txt * Update functional.py * Update cextension.py * Update cextension.py --------- Co-authored-by:
MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com> Co-authored-by:
MISHANMAUYRA <mishanmaurya31081@gmail.com> Co-authored-by:
amcamd <andrew.chapman@amd.com> Co-authored-by:
Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com>
-
- 18 Jun, 2025 1 commit
-
-
Chetan Kumar Verma authored
-
- 17 Jun, 2025 1 commit
-
-
Matthew Douglas authored
* Setup XPU CI * CI: expand XPU matrix * test * test * test * test * test * test * test * test * test * test * skip some fp4 tests on hpu * skip some fp4 tests on hpu * skip gemv tests on hpu * test * Additional test patches for HPU * HPU test update * HPU test update * HPU test update * HPU test update * Format
-
- 16 Jun, 2025 1 commit
-
-
Chetan Kumar Verma authored
-
- 11 Jun, 2025 2 commits
-
-
Egor authored
-
Dmitrii Makarenko authored
* [xpu/triton] Add trtion dequantization kernel This PR adds xpu backend and trtion kernel for dequantization nf4 dtype. Trtion is an optional import. Tests: tests/test_functional.py::TestQuantize4BitFunctional supported nf4/fp4 cases tests/test_functional.py::Test8BitBlockwiseQuantizeFunctional implemented quantize_blockwise with binary search that works faster for XPU tests/test_linear4bit.py Signed-off-by:Dmitrii Makarenko <dmitrii.makarenko@intel.com> * align with ipex code * enable test for ipex * test_kbit_backprop: skip no longer needed * remove unused --------- Signed-off-by:
Dmitrii Makarenko <dmitrii.makarenko@intel.com>
-
- 08 Jun, 2025 1 commit
-
-
Matthew Douglas authored
-
- 06 Jun, 2025 1 commit
-
-
Matthew Douglas authored
-
- 04 Jun, 2025 1 commit
-
-
Matthew Douglas authored
* Deprecation cleanup: remove histogram_scatter_add_2d * Deprecation cleanup: vectorwise_mm_dequant * Deprecation cleanup: vectorwise_quant * Remove unused test * Optimizer test cleanup * Deprecations: remove estimate_quantiles, create_quantile_map * Move deprecated test
-
- 03 Jun, 2025 1 commit
-
-
Matthew Douglas authored
-
- 02 Jun, 2025 2 commits
-
-
Matthew Douglas authored
* Tests: add linux x64 cpu+ipex to nightly CI workflow * typo * Tests: guard linear8bit compile test for ipex cpu issue
-
Matthew Douglas authored
* Tests: xfail opcheck for 4bit quantization with floating storage dtypes * Tests: xfail opcheck for 4bit quantization with floating storage dtypes * Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch * Tests: skip test_gemv_eye_4bit on CPU with bf16 when not supported by torch
-
- 28 May, 2025 1 commit
-
-
jiqing-feng authored
* enable ipex Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix cpu 8bit quantization Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix int8 and nf4 cpu inference Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * add cpu fp4 and rem Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 xpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix ipex op Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 name Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix dequantize nf4 ipex Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bitfp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * enable cpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix quantize blockwise output shape Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix quant_storage bf16 and gemv cpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix cpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix xpu tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix lib Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip xpu dequantize blockwise op check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bit Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * skip not used function teests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix matmul8bit fp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * check ipex before MatMul8bitFp Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update ipex install guide Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update install guide Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix error log Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix error lof Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update comment Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * move torch op to default Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * revert ipex check Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix code tabledevice Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix code table device Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix xpu ops Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com>
-
- 24 May, 2025 2 commits
-
-
Matthew Douglas authored
* Add torch.compile tests * Tests: WA aarch64 CPU regressions for torch 2.6.0; add Windows torch==2.7.0+cu118 test config * Tests: skip torch.compile for cuda on windows
-
Matthew Douglas authored
* General cleanup & test improvements * Tests: WA numpy 2 compat issue for torch<2.3 * Tests: update aarch64 cpu min torch version * Tests: update aarch64 cpu min torch version * Tests: update aarch64 cpu min torch version
-
- 21 May, 2025 1 commit
-
-
Matthew Douglas authored
-
- 19 May, 2025 1 commit
-
-
Matthew Douglas authored
* Test g5g runner * Switch L4 to L40S runner; swap GitHub Linux T4 runner for AWS g4dn * Run tests on last 2 pytorch stable releases * Run tests on last 2 pytorch stable releases
-
- 13 May, 2025 1 commit
-
-
Matthew Douglas authored
* Improvements for testing suite * Add workflow for macOS arm64 CPU tests
-
- 29 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Run unit tests on GH Actions * fix * fix * trigger workflow * Update * Update * Update * Run tests nightly * Disable paged optimizer test on Windows * Skip unit tests on Windows for CUDA 12.x (driver on runner is too old)
-
- 28 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Additional 4bit CPU ops * Additional 4bit CPU ops * Implement additional device-agnostic ops and test updates * More test fixes * int8 tests passing * Fix feature flag for multi_backend
-
- 22 Apr, 2025 1 commit
-
-
Matthew Douglas authored
* Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit * Make test suite more device-agnostic * Additional device agnostic tests * Additional device agnosticism for tests * Add BNB_TEST_DEVICE env var to manually select device for unit tests * Include device support tags for transformers multi-backend compatability; add xpu() and cpu() to Params4bit * Make test suite more device-agnostic * Additional device agnostic tests * Additional device agnosticism for tests * Add BNB_TEST_DEVICE env var to manually select device for unit tests * Small bugfix for int8 test * Exclude backward() from code coverage reports * Params4bit: don't try to quantize when moving to meta device
-
- 27 Mar, 2025 2 commits
-
-
Matthew Douglas authored
* Testing cleanup * More test cleanup * Additional deprecations/removals. * Skip benchmark, deprecated, slow tests by default
-
Matthew Douglas authored
* Drop Python 3.8 support. * Formatting
-
- 25 Mar, 2025 1 commit
-
-
Matthew Douglas authored
* Sketch out first custom op registration * Add note * Initial int8 op registration * Cleanup some deprecated functions. * Int8 ops updates; tests * Implement 4bit quant/dequant ops * Fix nested quant * cleanup * Test improvements * Clean up and improve tests * Add higher level custom op for int8 matmul + dequant + bias * Add gemv 4bit custom op * Cleanup * Implement out kwarg overloads for custom ops * Update PyTorch minimum to 2.1 * Deprecation updates * Deprecation updates * Cleanup; rename int8_linear_dequant -> int8_scaled_mm * Bump min pytorch to 2.2 * cleanup * Test reorganization * Remove deprecated supports_igemmlt * More cleanup * Cleanup obsolete C++/CUDA code * Cleanup * Create 'default' backend for fallback op implementations; initial CPU nf4 work * Stub out for multi-platform * Fix serialization tests for torch>=2.6.0 * Add example for torch.compile e2e inference * Test update --------- Co-authored-by:Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
-
- 05 Dec, 2024 1 commit
-
-
Matthew Douglas authored
* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation * Fix unintended change * New naive mm_dequant kernel for row-major; cleanup * fix * int8 refactor: initial sparse decomp, cleanup * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup * int8: inference optimizations, some cleanup * int8: more tests passing, cleanup * int8 - more cleanup, most tests passing * int8: specify CUDA stream for int8 ops * perf: reduce overhead from getting cudaStream ptr * Mark some functions for deprecation. * int8 sparse decomp: small perf improvement * update setup.py * Update bitsandbytes/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn * int8 cleanup * Ignore ruff rule ISC001 (incompatible with formatter) * add comment * int8 more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8: rename / deprecate old fn signatures * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * type annotation * format update * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Add comment to explain division optimization * more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Type annotations, cleanup * remove unused kernels; improved type annotations * small perf optimization for single-GPU systems * small perf optimization for single-GPU systems * update docstrings * Improve docs and tests * Update docstring * Update test * add benchmarking script * test cleanup: add deprecated marker, move benchmarks out * Add int8 dequant function; misc improvements * int8 matmul fallback for inner dims not divisible by 4 * improve register usage of kInt8VectorQuant - especially for A100/H100 * disable fail-fast for package build * maxwell compat * ptxas verbose * docs update * doc update * backward fix * Bugfix sparse decomp * Int8 fix for PEFT OLoRA init * Fix test for deprecated spmm_coo * test improvement * doc update * typo * doc cleanup * docs * add inference benchmark script * Add benchmarks, doc update --------- Co-authored-by:
Aarni Koskela <akx@iki.fi>
-
- 20 Sep, 2024 2 commits
-
-
Matthew Douglas authored
* Change 8bit optimizer blocksize 2048->256; additional bf16 support * Update tolerances for 8bit optimizer tests
-
Matthew Douglas authored
* Add AdEMAMix optimizer * Add PagedAdEMAMix32bit, AdEMAMix32bit * Add PagedAdEMAMix32bit, AdEMAMix32bit * AdEMAMix: add support for alpha/beta3 scheduling * Update paged AdEMAMix
-
- 14 Aug, 2024 1 commit
-
-
Matthew Douglas authored
-
- 06 Aug, 2024 1 commit
-
-
Vladimir Malinovskii authored
* Embedding4bit and Embedding8bit implementation * lint * Update bitsandbytes/nn/modules.py Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update bitsandbytes/nn/modules.py Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update bitsandbytes/nn/modules.py Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * saving -> Saving --------- Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
-
- 15 Jul, 2024 1 commit
-
-
Vladimir Malinovskii authored
* fixed test_4bit_warnings on cpu-only platforms * fixed linear8bit-based tests for cpu only platforms
-
- 30 May, 2024 1 commit
-
-
Benjamin Bossan authored
This requires to implement the __deepcopy__ method in Int8Params. Moreover, there was an issue in the Linear8BitLT constructor that would assign instance attributes to the class, which is now fixed. Please review carefully that this does not impact existing code. Tests that I ran: - pytest tests/test_linear8bitlt.py - in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py - in PEFT: python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py - in transformers: RUN_SLOW=1 python -m pytest tests/quantization/bnb -x
-
- 29 May, 2024 1 commit
-
-
Benjamin Bossan authored
As discussed internally, use state = self.__dict__.copy(), which is also what the Python docs recommend.
-
- 02 Apr, 2024 1 commit
-
-
Matthew Douglas authored
-