Commits · 90f54199cfc523432cd4b966089b431b0812f0c5 · OpenDAS / bitsandbytes

30 Sep, 2025 1 commit

Documentation updates for v0.48.0 (#1770) · 90f54199

Matthew Douglas authored Sep 30, 2025

* Update installation docs

* Update links

* Fix cuda min glibc in doc

* Update header levels

* Update AMD section

* typo

90f54199

29 Sep, 2025 2 commits
- Linear8bitLt: support device movement after forward() (#1769) · b8d1c261
  Matthew Douglas authored Sep 29, 2025
  
  b8d1c261
- ROCm: Add 6.4 and 7.0 builds (#1767) · 42e8abc3
  Matthew Douglas authored Sep 29, 2025
  
  42e8abc3
26 Sep, 2025 2 commits

Add Thor support (#1764) · 50be19c3

Jun Jiang authored Sep 27, 2025



* Update build-cuda.sh

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

50be19c3

Update workflow for packaging (#1766) · c72aae78

Matthew Douglas authored Sep 26, 2025

* Update workflow for packaging

* Run the workflow when files change in `.github/scripts/**`
* Shorter names for build jobs
* Change Windows CUDA install back to "network" using subpackages

* Update

* Modify sub-packages for CUDA version handling

Updated sub-packages to conditionally include 'crt' for CUDA 13.

* Update CUDA sub-packages in workflow configuration

* Change CUDA install method to 'local' for version 13 on Windows

* Modify CUDA sub-packages for version 13 support

* Change CUDA install method to 'network' in workflow

* CUDA build script: only install security updates in container

* CUDA build script: only install security updates in container

* Pin macos build runner to macos-15 and windows to windows-2025

* ROCm build: remove unneeded build step

c72aae78

25 Sep, 2025 1 commit
- Build/Package Intel XPU binary for Linux (#1763) · 09ea8618
  Matthew Douglas authored Sep 25, 2025
```
* Intel XPU: build and package binary for Linux

* Update artifact name
```
  09ea8618
24 Sep, 2025 1 commit

Fix for warpSize deprecation in ROCm 7.0 (#1762) · b72b766e

pnunna93 authored Sep 24, 2025



* Port ROCm changes from multi-backend-refactor branch

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update test_functional.py

* Update test_functional.py

* Update cextension.py

* Update cuda_specs.py

* Update cuda_specs.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_cuda_setup_evaluator.py

* Update test_functional.py

* Update modules.py

* Update modules.py

* Update ops.py

* Update test_linear4bit.py

* Update ops.py

* Update ops.py

* Update test_linear4bit.py

* Update test_linear4bit.py

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Create build-rocm.sh

* Update cuda_specs.py

* Fix trailing whitespace

* Remove conflicts.diff

* update for hipblasVersionMajor >=3

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update main.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update test_linear4bit.py

* Lint

* Lint

* Update helpers.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Lint

* Update pythonInterface.cpp

* lint fix

* lint

* Update pythonInterface.cpp

* revert permissions change

* Fix indentation

* Update kernels_hip.cuh

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update kernels_hip.cuh

* Update kernels.hip

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update ops.hip

* Update CMakeLists.txt

* Update functional.py

* Update cextension.py

* Update cextension.py

* warpSize is being made non constexpr in ROCm 7.0

* Merge pull request #90 from ROCm/IFU-rocm_enabled-09-23-2025

Ifu rocm enabled 09 23 2025

* Fix typo

* unskip test_4bit_quant

---------
Co-authored-by: MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com>
Co-authored-by: MISHANMAUYRA <mishanmaurya31081@gmail.com>
Co-authored-by: amcamd <andrew.chapman@amd.com>
Co-authored-by: Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com>
Co-authored-by: sstamenk <strahinja.stamenkovic@amd.com>

b72b766e

23 Sep, 2025 1 commit

Add CUDA 13.0 Support (#1761) · bdb8b2b7

Matthew Douglas authored Sep 23, 2025

* CUDA 13 build enablement

* Try to fix Windows build workflow

* Add torch 2.9+cu130 to tests

* Fix python version

* Update test workflow

* Don't test CPU on torch 2.9 yet

* Update doc

bdb8b2b7

22 Sep, 2025 1 commit
- Update README.md · e8170363
  Matthew Douglas authored Sep 22, 2025
  
  e8170363
19 Sep, 2025 2 commits
- Add function to reverse 4bit weights for HPU (#1757) · 2adcb7a7
  Vivek Goel authored Sep 19, 2025
```
* Add function to reverse 4bit weights for HPU

* Fix lint error
```
  2adcb7a7
- Update log (#1758) · b2a8a156
  YangKai0616 authored Sep 19, 2025
  
  b2a8a156
18 Sep, 2025 1 commit

[CUDA] Branchless NF4/FP4 kDequantizeBlockwise kernel for faster dequantization (#1746) · b1f80b8a

Mohamed Hisham authored Sep 18, 2025

* Added branchless LUT-based dequantization for FP4 and NF4

* Added extra command line options to control reproducibility

* Restore FP4 quantization/dequantization order

b1f80b8a

16 Sep, 2025 3 commits
- Bump minimum PyTorch to 2.3 (#1754) · c9bce2b4
  Matthew Douglas authored Sep 16, 2025
```
* Bump minimum PyTorch to 2.3

* Tests: Fix Windows numpy<2 compatibility for torch<2.4.1
```
  c9bce2b4
- Drop Maxwell (sm50) build from distribution (#1755) · dd1929ba
  Matthew Douglas authored Sep 16, 2025
  
  dd1929ba
- [XPU] Implemented 8bit optimizers in triton (#1692) · 404e2776
  Egor authored Sep 16, 2025
```
* implemented 8bit optimizers

* Add interface

* Commented out torch checks

* Merged

* Updated kernels

* Reused code for quant/dequant

* Removed empty line

* Changed Readme
```
  404e2776
15 Sep, 2025 4 commits

Lint fix · 4b025748
Matthew Douglas authored Sep 15, 2025

4b025748

Add SYCL Kernels for XPU backend (#1679) · 1813b058

Liu Xiaoli authored Sep 15, 2025



* Add SYCL Kernels for XPU backend

* fix transpose
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix log and format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert cpu changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean ipex_xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean ipex import
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex cpu import
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* refine gemv_4bit kernel

* enable FP4 for dequant_4bit and gemv_4bit

* refine FP4 dequantization performance

* remove check for better performance
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean code

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory issue

* fix ut failure

* adjust threshold
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* change test_functional check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test_module
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Enable Windows build and refine code

* fix xpu log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* remove ipex entirely
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu int8 CB
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lint
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix logs (#12)

* fix logs
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix sycl lint error and tests (#13)

* fix sycl nd
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip typo check for xpu kernel codes (#14)

* skip test for xpu ops
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lint
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip typo for xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* register triton kernel for quantization (#15)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix version comparison issue (#18)

# Description

The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string

# Error message
```
The 8-bit optimizer is not available on your device, only available on CUDA for now.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Traceback (most recent call last):
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module>
    import unsloth
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module>
    from .models import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module>
    from .llama     import FastLlamaModel
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module>
    from ._utils import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module>
    from unsloth_zoo.patching_utils import (
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module>
    import transformers.integrations.bitsandbytes
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module>
    import bitsandbytes as bnb
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module>
    from .backends.xpu import ops as xpu_ops
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module>
    if version.parse(torch.__version__).release >= version.parse("2.9"):
TypeError: '>=' not supported between instances of 'tuple' and 'Version'
```

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Er-Xin (Edwin) Shang <shangerxin@hotmail.com>

1813b058

[XPU] Implemented 32bit optimizers in triton (#1710) · 275671be

YangKai0616 authored Sep 15, 2025



* Implemented 32bit optimizers in triton

* Modify Comments

* Optimizing pure torch implementation

* Restore the order of parameters and modify the position of pure pytorch implementation

* Restore files permissions

---------
Co-authored-by: Fanli Lin <fanli.lin@intel.com>

275671be

Lint fix · d848d4db
Matthew Douglas authored Sep 15, 2025

d848d4db

09 Sep, 2025 1 commit

Test improvements (#1750) · 6a07ffe0

Matthew Douglas authored Sep 09, 2025

* Test suite improvements for MPS/XPU/HPU

* Skip test on torch==2.8.0+cpu for Windows regression

6a07ffe0

08 Sep, 2025 2 commits

Adjust 4bit test tolerance on CPU for larger blocksizes (#1749) · d731fc42
Matthew Douglas authored Sep 08, 2025

d731fc42

4bit quantization for arbitrary `nn.Parameter` (#1720) · 27549fb0

Matthew Douglas authored Sep 08, 2025

* Add parametrize util for targeting parameters outside of nn.Linear modules

* Parametrize 4bit: replace existing prequantized weight

* cleanup

* Add caching for parametrization

* Add tests

* Fix tests

* Guard for torch < 2.5

* Guard for torch < 2.5

* Another test gaurd for torch >= 2.5

27549fb0

03 Sep, 2025 2 commits

for intel xpu case, use MatMul8bitFp even not use ipex (#1728) · 39dd8471

kaixuanliu authored Sep 04, 2025



* for intel xpu case, use MatMul8bitFp even not use ipex
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix lint issue
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

39dd8471

add int mm for xpu after torch 2.9 (#1736) · a09d05a0

jiqing-feng authored Sep 04, 2025



* add int mm for xpu after torch 2.9
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add packaging on pyproject
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

a09d05a0

02 Sep, 2025 1 commit

Enable F841 (#1727) · c76e208f

Yuanyuan Chen authored Sep 02, 2025



* Fix unused variable warnings and other ruff warnings
Signed-off-by: cyy <cyyever@outlook.com>

* Fix format
Signed-off-by: cyy <cyyever@outlook.com>

---------
Signed-off-by: cyy <cyyever@outlook.com>

c76e208f

25 Aug, 2025 1 commit
- add py.typed (#1726) · ff389db7
  Yuanyuan Chen authored Aug 26, 2025
```
Signed-off-by: cyy <cyyever@outlook.com>
```
  ff389db7
11 Aug, 2025 4 commits
- Restore temporary changes from release · 7bfe923c
  Matthew Douglas authored Aug 11, 2025
  
  7bfe923c
- Bump dev version · 9088107f
  Matthew Douglas authored Aug 11, 2025
  
  9088107f
- Release 0.47.0 · c0dcdf27
  Matthew Douglas authored Aug 11, 2025
  
  c0dcdf27
- Temporary updates for release · 59593890
  Matthew Douglas authored Aug 11, 2025
  
  59593890
06 Aug, 2025 2 commits
- Merge pull request #1721 from Mhmd-Hisham/quantization-packing-bug-fix · 19fe95ac
  Matthew Douglas authored Aug 06, 2025
```
[CUDA] Fixing quantization uint8 packing bug for NF4 and FP4
```
  19fe95ac
- Merge pull request #1719 from ved1beta/fsdp_integration2 · 42653921
  Matthew Douglas authored Aug 06, 2025
```
Fix Params4bit tensor subclass handling
```
  42653921
04 Aug, 2025 1 commit
- lint · 0ecb8fb4
  ved1beta authored Aug 04, 2025
  
  0ecb8fb4
02 Aug, 2025 2 commits
- test_params4bit_torch_chunk_split · 2938c739
  ved1beta authored Aug 02, 2025
  
  2938c739
- Fixing quantization uint8 packing bug for NF4 and FP4 · 639f8c05
  Mohamed Hisham authored Aug 02, 2025
  
  639f8c05
31 Jul, 2025 1 commit
- Fix Params4bit tensor subclass handling · 1dbe6021
  ved1beta authored Aug 01, 2025
  
  1dbe6021
21 Jul, 2025 4 commits
- Merge pull request #1715 from bitsandbytes-foundation/adjust-cuda-build · e54dc125
  Matthew Douglas authored Jul 21, 2025
```
Add Volta support in cu128/cu129 builds
```
  e54dc125
- Add Volta support in cu128/cu129 builds · ec192295
  Matthew Douglas authored Jul 21, 2025
  
  ec192295
- Merge pull request #1714 from bitsandbytes-foundation/add-funding · 33449ee4
  Matthew Douglas authored Jul 21, 2025
```
Create FUNDING.yml
```
  33449ee4
- Create FUNDING.yml · df67c707
  Matthew Douglas authored Jul 21, 2025
  
  df67c707