Commits · 1abd5e781013a085f86586b30a248dc769909668 · OpenDAS / bitsandbytes

27 Jun, 2025 1 commit

Matthew Douglas authored Jun 27, 2025

* Add CUDA 12.9 to build/test workflows

* Downgrade Jimver/cuda-toolkit to v0.2.24

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update tests.yml

* Update tests.yml

1abd5e78

20 Jun, 2025 1 commit

Enable ROCm backend with custom ops integration (#1683) · 888788d7

pnunna93 authored Jun 20, 2025



* Port ROCm changes from multi-backend-refactor branch

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update ops.py

* Update functional.py

* Update functional.py

* Update functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_ops.py

* Update test_functional.py

* Update test_functional.py

* Update functional.py

* Update functional.py

* Update ops.py

* Update ops.py

* Update test_functional.py

* Update test_functional.py

* Update cextension.py

* Update cuda_specs.py

* Update cuda_specs.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_cuda_setup_evaluator.py

* Update test_functional.py

* Update modules.py

* Update modules.py

* Update ops.py

* Update test_linear4bit.py

* Update ops.py

* Update ops.py

* Update test_linear4bit.py

* Update test_linear4bit.py

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Create build-rocm.sh

* Update cuda_specs.py

* Fix trailing whitespace

* Remove conflicts.diff

* update for hipblasVersionMajor >=3

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update main.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Update test_linear4bit.py

* Lint

* Lint

* Update helpers.py

* Update test_functional.py

* Update test_linear4bit.py

* Update test_ops.py

* Lint

* Update pythonInterface.cpp

* lint fix

* lint

* Update pythonInterface.cpp

* revert permissions change

* Fix indentation

* Update kernels_hip.cuh

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update kernels_hip.cuh

* Update kernels.hip

* Update kernels.hip

* Update ops.hip

* Update ops_hip.cuh

* Update ops.hip

* Update CMakeLists.txt

* Update functional.py

* Update cextension.py

* Update cextension.py

---------
Co-authored-by: MISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com>
Co-authored-by: MISHANMAUYRA <mishanmaurya31081@gmail.com>
Co-authored-by: amcamd <andrew.chapman@amd.com>
Co-authored-by: Prasanth Nunna <root@banff-cyxtera-s78-1.amd.com>

888788d7

13 May, 2025 1 commit
- Switch CUDA builds to use Rocky Linux 8 container (#1638) · d870f9c5
  Matthew Douglas authored May 13, 2025
  
  d870f9c5
02 May, 2025 1 commit

Linux aarch64 CI updates (#1622) · 49c044b1

Matthew Douglas authored May 02, 2025

* Add aarch64 cpu tests and CUDA build to nightly workflow

* aarch64: limit CUDA targets to sm75, sm80, sm90, sm100

* aarch64: limit CUDA targets to sm75, sm80, sm90, sm100

* Update build cpu script

* fix

* Update auditwheel for aarch64

49c044b1

29 Apr, 2025 1 commit

Set up nightly CI for unit tests (#1619) · a5dd01bb

Matthew Douglas authored Apr 29, 2025

* Run unit tests on GH Actions

* fix

* fix

* trigger workflow

* Update

* Update

* Update

* Run tests nightly

* Disable paged optimizer test on Windows

* Skip unit tests on Windows for CUDA 12.x (driver on runner is too old)

a5dd01bb

22 Apr, 2025 1 commit

Stop building for CUDA toolkit < 11.8 (#1605) · 53daa0e2

Matthew Douglas authored Apr 22, 2025

* Stop building for CUDA toolkit < 11.8

* Simplify

* Drop sm70 from cu128 build targets to align with pytorch

53daa0e2

24 Feb, 2025 2 commits
- Update build-cuda.sh · fc6d8b24
  Matthew Douglas authored Feb 24, 2025
  
  fc6d8b24
- Update build-cuda.sh · e4a9a94c
  Matthew Douglas authored Feb 24, 2025
  
  e4a9a94c
23 Jan, 2025 1 commit
- (build) include Ada/Hopper targets in cu118 build (#1487) · b4172770
  Matthew Douglas authored Jan 23, 2025
  
  b4172770
22 Jan, 2025 1 commit

Initial support blackwell (#1481) · db90effe

Johnny authored Jan 22, 2025



* initial support blackwell

* Update CHANGELOG.md
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update CMakeLists.txt

* Update CHANGELOG.md

* fix build-cuda.sh

* fix build-cuda.sh

* fix cuda 12.7 build-cuda.sh

* Update build-cuda.sh

* Update cuda from 12.6.2 to 12.6.3

* Update .github/workflows/python-package.yml
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update install_cuda.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update install_cuda.sh
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update .github/scripts/build-cuda.sh

* Update install_cuda.sh

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

db90effe

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

13 Mar, 2024 1 commit
- Reformat with ruff-format · 5a4263f4
  Ruff authored Feb 24, 2024
  
  5a4263f4
11 Mar, 2024 2 commits
- Add audit-wheel step · 7af138ab
  Aarni Koskela authored Mar 08, 2024
```
Closes #1114
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
```
  7af138ab
- Move build scripts to .github/scripts (from scripts/ and workflow YAML) · 62485a34
  Aarni Koskela authored Mar 08, 2024
  
  62485a34