Commits · b1f80b8acc6f8cfa7932dece460f6b600466dd34 · OpenDAS / bitsandbytes

18 Sep, 2025 1 commit

[CUDA] Branchless NF4/FP4 kDequantizeBlockwise kernel for faster dequantization (#1746) · b1f80b8a

Mohamed Hisham authored Sep 18, 2025

* Added branchless LUT-based dequantization for FP4 and NF4

* Added extra command line options to control reproducibility

* Restore FP4 quantization/dequantization order

b1f80b8a

16 Sep, 2025 3 commits
- Bump minimum PyTorch to 2.3 (#1754) · c9bce2b4
  Matthew Douglas authored Sep 16, 2025
```
* Bump minimum PyTorch to 2.3

* Tests: Fix Windows numpy<2 compatibility for torch<2.4.1
```
  c9bce2b4
- Drop Maxwell (sm50) build from distribution (#1755) · dd1929ba
  Matthew Douglas authored Sep 16, 2025
  
  dd1929ba
- [XPU] Implemented 8bit optimizers in triton (#1692) · 404e2776
  Egor authored Sep 16, 2025
```
* implemented 8bit optimizers

* Add interface

* Commented out torch checks

* Merged

* Updated kernels

* Reused code for quant/dequant

* Removed empty line

* Changed Readme
```
  404e2776
15 Sep, 2025 4 commits

Lint fix · 4b025748
Matthew Douglas authored Sep 15, 2025

4b025748

Add SYCL Kernels for XPU backend (#1679) · 1813b058

Liu Xiaoli authored Sep 15, 2025



* Add SYCL Kernels for XPU backend

* fix transpose
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix log and format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert cpu changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean ipex_xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean ipex import
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex cpu import
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* refine gemv_4bit kernel

* enable FP4 for dequant_4bit and gemv_4bit

* refine FP4 dequantization performance

* remove check for better performance
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean code

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory issue

* fix ut failure

* adjust threshold
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* change test_functional check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test_module
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Enable Windows build and refine code

* fix xpu log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* remove ipex entirely
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu int8 CB
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lint
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix logs (#12)

* fix logs
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix sycl lint error and tests (#13)

* fix sycl nd
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip typo check for xpu kernel codes (#14)

* skip test for xpu ops
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lint
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip typo for xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* register triton kernel for quantization (#15)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix version comparison issue (#18)

# Description

The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string

# Error message
```
The 8-bit optimizer is not available on your device, only available on CUDA for now.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Traceback (most recent call last):
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module>
    import unsloth
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module>
    from .models import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module>
    from .llama     import FastLlamaModel
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module>
    from ._utils import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module>
    from unsloth_zoo.patching_utils import (
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module>
    import transformers.integrations.bitsandbytes
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module>
    import bitsandbytes as bnb
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module>
    from .backends.xpu import ops as xpu_ops
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module>
    if version.parse(torch.__version__).release >= version.parse("2.9"):
TypeError: '>=' not supported between instances of 'tuple' and 'Version'
```

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Er-Xin (Edwin) Shang <shangerxin@hotmail.com>

1813b058

[XPU] Implemented 32bit optimizers in triton (#1710) · 275671be

YangKai0616 authored Sep 15, 2025



* Implemented 32bit optimizers in triton

* Modify Comments

* Optimizing pure torch implementation

* Restore the order of parameters and modify the position of pure pytorch implementation

* Restore files permissions

---------
Co-authored-by: Fanli Lin <fanli.lin@intel.com>

275671be

Lint fix · d848d4db
Matthew Douglas authored Sep 15, 2025

d848d4db

09 Sep, 2025 1 commit

Test improvements (#1750) · 6a07ffe0

Matthew Douglas authored Sep 09, 2025

* Test suite improvements for MPS/XPU/HPU

* Skip test on torch==2.8.0+cpu for Windows regression

6a07ffe0

08 Sep, 2025 2 commits

Adjust 4bit test tolerance on CPU for larger blocksizes (#1749) · d731fc42
Matthew Douglas authored Sep 08, 2025

d731fc42

4bit quantization for arbitrary `nn.Parameter` (#1720) · 27549fb0

Matthew Douglas authored Sep 08, 2025

* Add parametrize util for targeting parameters outside of nn.Linear modules

* Parametrize 4bit: replace existing prequantized weight

* cleanup

* Add caching for parametrization

* Add tests

* Fix tests

* Guard for torch < 2.5

* Guard for torch < 2.5

* Another test gaurd for torch >= 2.5

27549fb0

03 Sep, 2025 2 commits

for intel xpu case, use MatMul8bitFp even not use ipex (#1728) · 39dd8471

kaixuanliu authored Sep 04, 2025



* for intel xpu case, use MatMul8bitFp even not use ipex
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix lint issue
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

39dd8471

add int mm for xpu after torch 2.9 (#1736) · a09d05a0

jiqing-feng authored Sep 04, 2025



* add int mm for xpu after torch 2.9
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add packaging on pyproject
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

a09d05a0

02 Sep, 2025 1 commit

Enable F841 (#1727) · c76e208f

Yuanyuan Chen authored Sep 02, 2025



* Fix unused variable warnings and other ruff warnings
Signed-off-by: cyy <cyyever@outlook.com>

* Fix format
Signed-off-by: cyy <cyyever@outlook.com>

---------
Signed-off-by: cyy <cyyever@outlook.com>

c76e208f

25 Aug, 2025 1 commit
- add py.typed (#1726) · ff389db7
  Yuanyuan Chen authored Aug 26, 2025
```
Signed-off-by: cyy <cyyever@outlook.com>
```
  ff389db7
11 Aug, 2025 4 commits
- Restore temporary changes from release · 7bfe923c
  Matthew Douglas authored Aug 11, 2025
  
  7bfe923c
- Bump dev version · 9088107f
  Matthew Douglas authored Aug 11, 2025
  
  9088107f
- Release 0.47.0 · c0dcdf27
  Matthew Douglas authored Aug 11, 2025
  
  c0dcdf27
- Temporary updates for release · 59593890
  Matthew Douglas authored Aug 11, 2025
  
  59593890
06 Aug, 2025 2 commits
- Merge pull request #1721 from Mhmd-Hisham/quantization-packing-bug-fix · 19fe95ac
  Matthew Douglas authored Aug 06, 2025
```
[CUDA] Fixing quantization uint8 packing bug for NF4 and FP4
```
  19fe95ac
- Merge pull request #1719 from ved1beta/fsdp_integration2 · 42653921
  Matthew Douglas authored Aug 06, 2025
```
Fix Params4bit tensor subclass handling
```
  42653921
04 Aug, 2025 1 commit
- lint · 0ecb8fb4
  ved1beta authored Aug 04, 2025
  
  0ecb8fb4
02 Aug, 2025 2 commits
- test_params4bit_torch_chunk_split · 2938c739
  ved1beta authored Aug 02, 2025
  
  2938c739
- Fixing quantization uint8 packing bug for NF4 and FP4 · 639f8c05
  Mohamed Hisham authored Aug 02, 2025
  
  639f8c05
31 Jul, 2025 1 commit
- Fix Params4bit tensor subclass handling · 1dbe6021
  ved1beta authored Aug 01, 2025
  
  1dbe6021
21 Jul, 2025 4 commits
- Merge pull request #1715 from bitsandbytes-foundation/adjust-cuda-build · e54dc125
  Matthew Douglas authored Jul 21, 2025
```
Add Volta support in cu128/cu129 builds
```
  e54dc125
- Add Volta support in cu128/cu129 builds · ec192295
  Matthew Douglas authored Jul 21, 2025
  
  ec192295
- Merge pull request #1714 from bitsandbytes-foundation/add-funding · 33449ee4
  Matthew Douglas authored Jul 21, 2025
```
Create FUNDING.yml
```
  33449ee4
- Create FUNDING.yml · df67c707
  Matthew Douglas authored Jul 21, 2025
  
  df67c707
14 Jul, 2025 11 commits
- Test fix · 14147f6f
  Matthew Douglas authored Jul 14, 2025
  
  14147f6f
- Merge pull request #1706 from Egor-Krivov/egor/8bit_int · 941681da
  Matthew Douglas authored Jul 14, 2025
```
Add kernel registration for 8bit and 32bit optimizers
```
  941681da
- Fixed default args · 0f6fe6bf
  Egor Krivov authored Jul 14, 2025
  
  0f6fe6bf
- Added mutated args to the schema · e33ba1c0
  Egor Krivov authored Jul 14, 2025
  
  e33ba1c0
- Removed cpu · 24d9139e
  Egor Krivov authored Jul 14, 2025
  
  24d9139e
- Changed number of errors · 36f5c4f4
  Egor Krivov authored Jul 14, 2025
  
  36f5c4f4
- Reverse lion · 236124ee
  Egor Krivov authored Jul 14, 2025
  
  236124ee
- Update to kernel registration · 4075a643
  Egor Krivov authored Jul 14, 2025
  
  4075a643
- Add no_cpu for optimizers · 223fea51
  Egor Krivov authored Jul 14, 2025
  
  223fea51
- Add 32bit optimizer interface · 3b89a05e
  Egor Krivov authored Jul 14, 2025
  
  3b89a05e
- enabled tests · abf4a1e3
  Egor Krivov authored Jul 14, 2025
  
  abf4a1e3