Commits · 74f7aa06f31ee8d453f18290537b69359fd69fb2 · OpenDAS / bitsandbytes

23 Oct, 2025 1 commit
- fix code, compiled and tested successfully · 74f7aa06
  limm authored Oct 23, 2025
  
  74f7aa06
24 Sep, 2025 1 commit

Fix for warpSize deprecation in ROCm 7.0 (#1762) · b72b766e

pnunna93 authored Sep 24, 2025



* Port ROCm changes from multi-backend-refactor branch

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update test_functional.py

* Update test_functional.py

* Update cextension.py

* Update cuda_specs.py

* Update cuda_specs.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_cuda_setup_evaluator.py

* Update test_functional.py

* Update modules.py

* Update modules.py

* Update ops.py

* Update test_linear4bit.py

* Update ops.py

* Update ops.py

* Update test_linear4bit.py

* Update test_linear4bit.py

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Create build-rocm.sh

* Update cuda_specs.py

* Fix trailing whitespace

* Remove conflicts.diff

* update for hipblasVersionMajor >=3

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update main.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update test_linear4bit.py

* Lint

* Lint

* Update helpers.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Lint

* Update pythonInterface.cpp

* lint fix

* lint

* Update pythonInterface.cpp

* revert permissions change

* Fix indentation

* Update kernels_hip.cuh

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update kernels_hip.cuh

* Update kernels.hip

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update ops.hip

* Update CMakeLists.txt

* Update functional.py

* Update cextension.py

* Update cextension.py

* warpSize is being made non constexpr in ROCm 7.0

* Merge pull request #90 from ROCm/IFU-rocm_enabled-09-23-2025

Ifu rocm enabled 09 23 2025

* Fix typo

* unskip test_4bit_quant

---------
Co-authored-by: MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com>
Co-authored-by: MISHANMAUYRA <mishanmaurya31081@gmail.com>
Co-authored-by: amcamd <andrew.chapman@amd.com>
Co-authored-by: Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com>
Co-authored-by: sstamenk <strahinja.stamenkovic@amd.com>

b72b766e

23 Sep, 2025 1 commit

Add CUDA 13.0 Support (#1761) · bdb8b2b7

Matthew Douglas authored Sep 23, 2025

* CUDA 13 build enablement

* Try to fix Windows build workflow

* Add torch 2.9+cu130 to tests

* Fix python version

* Update test workflow

* Don't test CPU on torch 2.9 yet

* Update doc

bdb8b2b7

15 Sep, 2025 1 commit

Add SYCL Kernels for XPU backend (#1679) · 1813b058

Liu Xiaoli authored Sep 15, 2025



* Add SYCL Kernels for XPU backend

* fix transpose
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix log and format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert cpu changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean ipex_xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean ipex import
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix ipex cpu import
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* refine gemv_4bit kernel

* enable FP4 for dequant_4bit and gemv_4bit

* refine FP4 dequantization performance

* remove check for better performance
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* clean code

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory issue

* fix ut failure

* adjust threshold
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* change test_functional check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test_module
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Enable Windows build and refine code

* fix xpu log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* remove ipex entirely
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu int8 CB
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lint
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix logs (#12)

* fix logs
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix sycl lint error and tests (#13)

* fix sycl nd
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip typo check for xpu kernel codes (#14)

* skip test for xpu ops
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix lint
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip typo for xpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* register triton kernel for quantization (#15)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix version comparison issue (#18)

# Description

The version comparison expression miss reference the .release property from the version object. This lead to compare between the tuple and the string

# Error message
```
The 8-bit optimizer is not available on your device, only available on CUDA for now.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Traceback (most recent call last):
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/unsloth_validation/run.py", line 1, in <module>
    import unsloth
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/__init__.py", line 235, in <module>
    from .models import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/__init__.py", line 15, in <module>
    from .llama     import FastLlamaModel
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/llama.py", line 23, in <module>
    from ._utils import *
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth/models/_utils.py", line 89, in <module>
    from unsloth_zoo.patching_utils import (
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/unsloth_zoo/patching_utils.py", line 629, in <module>
    import transformers.integrations.bitsandbytes
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/v/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 20, in <module>
    import bitsandbytes as bnb
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/__init__.py", line 39, in <module>
    from .backends.xpu import ops as xpu_ops
  File "/home/erxin/jenkins/workspace/Unsloth_Benchmark/bitsandbytes/bitsandbytes/backends/xpu/ops.py", line 17, in <module>
    if version.parse(torch.__version__).release >= version.parse("2.9"):
TypeError: '>=' not supported between instances of 'tuple' and 'Version'
```

---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Er-Xin (Edwin) Shang <shangerxin@hotmail.com>

1813b058

20 Jun, 2025 1 commit

Enable ROCm backend with custom ops integration (#1683) · 888788d7

pnunna93 authored Jun 20, 2025



* Port ROCm changes from multi-backend-refactor branch

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update test_functional.py

* Update test_functional.py

* Update cextension.py

* Update cuda_specs.py

* Update cuda_specs.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_cuda_setup_evaluator.py

* Update test_functional.py

* Update modules.py

* Update modules.py

* Update ops.py

* Update test_linear4bit.py

* Update ops.py

* Update ops.py

* Update test_linear4bit.py

* Update test_linear4bit.py

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Create build-rocm.sh

* Update cuda_specs.py

* Fix trailing whitespace

* Remove conflicts.diff

* update for hipblasVersionMajor >=3

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update main.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update test_linear4bit.py

* Lint

* Lint

* Update helpers.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Lint

* Update pythonInterface.cpp

* lint fix

* lint

* Update pythonInterface.cpp

* revert permissions change

* Fix indentation

* Update kernels_hip.cuh

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update kernels_hip.cuh

* Update kernels.hip

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update ops.hip

* Update CMakeLists.txt

* Update functional.py

* Update cextension.py

* Update cextension.py

---------
Co-authored-by: MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com>
Co-authored-by: MISHANMAUYRA <mishanmaurya31081@gmail.com>
Co-authored-by: amcamd <andrew.chapman@amd.com>
Co-authored-by: Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com>

888788d7

22 Apr, 2025 1 commit

Stop building for CUDA toolkit < 11.8 (#1605) · 53daa0e2

Matthew Douglas authored Apr 22, 2025

* Stop building for CUDA toolkit < 11.8

* Simplify

* Drop sm70 from cu128 build targets to align with pytorch

53daa0e2

22 Jan, 2025 1 commit

Initial support blackwell (#1481) · db90effe

Johnny authored Jan 22, 2025



* initial support blackwell

* Update CHANGELOG.md
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update CMakeLists.txt

* Update CHANGELOG.md

* fix build-cuda.sh

* fix build-cuda.sh

* fix cuda 12.7 build-cuda.sh

* Update build-cuda.sh

* Update cuda from 12.6.2 to 12.6.3

* Update .github/workflows/python-package.yml
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update install_cuda.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update install_cuda.sh
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update .github/scripts/build-cuda.sh

* Update install_cuda.sh

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

db90effe

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

09 Sep, 2024 1 commit
- Update for VS2022 17.11 compatibility with CUDA < 12.4 (#1341) · 17da4f6f
  Matthew Douglas authored Sep 09, 2024
```
* Update for VS2022 17.11 compatibility with CUDA < 12.4

* Try again
```
  17da4f6f
15 Jul, 2024 1 commit
- Fix Windows CUDA build compatibility with newest MSVC (#1276) · 6948f0b8
  Matthew Douglas authored Jul 15, 2024
```
* Add support for building with latest MSVC

* Update MSVC 1940+ support for CUDA builds.
```
  6948f0b8
08 Mar, 2024 1 commit

Build: Expand CUDA Toolkit Matrix (#1111) · 1cfc2777

Matthew Douglas authored Mar 07, 2024



* (ci) build with wider CUDA version matrix

* (ci) build with wider CUDA version matrix

* (ci) skip sm_89 target on CUDA 11.7

* (ci) skip sm_90 target on CUDA 11.8

* modify workflow to publish to test.pypi

* (build) Test for manylinux_2_24 build on GH actions

* (build) got that backwards.

* try fixing manual triggering condition for testpypi

* try if Ubuntu 18.04 is an easy fix to allow for `manylinux_2_24` compatibility

* hardcode publish step to run to test publishing

* set ubuntu to newest supported version

* try statically linking libstdc++ to achieve manylinux_2_18

* last commit only brought us to manylinux_2_34, reverse

* add misssing permission for publishing to pypi

* snake case deprecated in favor of kebab

* downgrade cuda ubuntu aiming for manylinux_2_24

* add step to upgrade cmake due to old Ubuntu for CUDA build

* adjust path to prefer pip installed cmake

* (cmake) set CMAKE_BUILD_TYPE=Release if unspecified

* default to CMAKE_BUILD_TYPE Release for optimized releases and better many_linux compatibility

* (build) back to ubuntu22.04 docker images

* verify Cmake in separte step

* add clarifying comment about Python version compatibility

* (build) we don't need cmake for wheel step

* fixup testpypi publish to run in PR for testing

* add pypi publishing when tagged on main

* add functionality to rewrite platform tags

* (ci) adjust platform tags for wheels

* fix for windows, get order right.

* fix for windows, get order right.

* (build) slim down those fatbins on windows cuda

* sloppy

* remove broken PyPi upload for now

---------
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

1cfc2777

27 Feb, 2024 2 commits

(cmake) Update library output directory (#1080) · cc5f8cd8
Matthew Douglas authored Feb 27, 2024

cc5f8cd8

(cmake) Fix cuda arch selection (#1091) · 753df25c

Matthew Douglas authored Feb 27, 2024

* (cmake) Fix generation of targets for nvcc

* Typo

* (ci) linux + CUDA workflow: make sure we specify target architectures

* fix

* fix one more time

* (cmake) Default in CMAKE_CUDA_ARCHITECTURES_ALL when cmake<3.23, make sure we build only selected cubins and only ptx for latest capability

* Fix static lookup for CMAKE_CUDA_ARCHITECTURES_ALL on cmake<3.23

* Remove debug setting

* clarification

753df25c

06 Feb, 2024 1 commit

HOTFIX: Fix regression (cpu fix) (#1038) · 6e0f84d4

Won-Kyu Park authored Feb 07, 2024



* add "_cpu" tag correctly (regression)

* add lib suffix ".dylib" for Darwin
Co-authored-by: Aarni Koskela <akx@iki.fi>

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

6e0f84d4

05 Feb, 2024 1 commit

Make native code portable and add GitHub workflow for building (#949) · 73d3e7b6

Rickard authored Feb 05, 2024



* Make native code portable and add GitHub workflow for building

* Removed deprecated Python versions

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml

* Do not test on Python 3.13 until released

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Refactor build stage

* Fixed breaking actions change

* Slim down Windows cuda

* Create dependabot.yml

* Bespoke local dev requirements.txt

* Enable VS integration

* Group Dependabot updates

* Cleanup

* Update python-package.yml

* Reinstate file that was wrongly merged

* Fixed regression caused by new version of download-artifact

* Update python-package.yml

* Update python-package.yml

* Fix matrix

* Update python-package.yml

* Merge

* Pipeline

* Fixed conflict

* Fixed conflict

* Update CMakeLists.txt

* Fixed merge error

* cleanup

* cleanup

* Find CUDA

* Fix

* Fixing merge error from latest merge from main

* Fix setup.py

* Fixed typo in artifact name

* Remove linker flags

* Build nocublaslt versions

* Fixed formatting

* Fixed VS Code format on save

* Ran format on save from VScode

* Re-saved the json files using the new settings

* Re-saved CMakeLists.txt to get formatting right

* Add path filter

* Formatting

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

73d3e7b6

01 Feb, 2024 1 commit

Add CMakeLists.txt · 5f76fe9d

James Wyatt authored Sep 25, 2023



 * fix project name and add lib prefix for win32 (2024/01/31)
 * set LIBRARY_OUTPUT_DIRECTORY property
Co-authored-by: Won-Kyu Park <wkpark@gmail.com>

5f76fe9d