Commits · 49c044b1daa22f17af6907ad26ae604758d55e1a · OpenDAS / bitsandbytes

02 May, 2025 2 commits

Linux aarch64 CI updates (#1622) · 49c044b1

Matthew Douglas authored May 02, 2025

* Add aarch64 cpu tests and CUDA build to nightly workflow

* aarch64: limit CUDA targets to sm75, sm80, sm90, sm100

* aarch64: limit CUDA targets to sm75, sm80, sm90, sm100

* Update build cpu script

* fix

* Update auditwheel for aarch64

49c044b1

Use ARM runners to build for Linux aarch64 (#1539) · 8a31eadf

Johnny authored May 02, 2025



* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Cleanup

* Matrix update

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

8a31eadf

22 Apr, 2025 1 commit

Stop building for CUDA toolkit < 11.8 (#1605) · 53daa0e2

Matthew Douglas authored Apr 22, 2025

* Stop building for CUDA toolkit < 11.8

* Simplify

* Drop sm70 from cu128 build targets to align with pytorch

53daa0e2

07 Apr, 2025 1 commit
- fix for missing cpu lib (#1585) · 55b84eea
  Titus authored Apr 07, 2025
  
  55b84eea
27 Mar, 2025 2 commits
- Drop Python 3.8 support. (#1574) · 677ff400
  Matthew Douglas authored Mar 27, 2025
```
* Drop Python 3.8 support.

* Formatting
```
  677ff400
- Bump CUDA 12.8.0 build to CUDA 12.8.1 (#1575) · 9b339952
  Matthew Douglas authored Mar 27, 2025
  
  9b339952
25 Feb, 2025 1 commit
- Build: use ubuntu-22.04 instead of 24.04 for CPU build (glibc compat) (#1538) · b8223fed
  Matthew Douglas authored Feb 25, 2025
  
  b8223fed
28 Jan, 2025 1 commit
- Blackwell binaries! (#1491) · f3e8cbb2
  Johnny authored Jan 28, 2025
```
* blackwell

* blackwell

* Update python-package.yml
```
  f3e8cbb2
17 Dec, 2024 1 commit

chore: migrate config files to `pyproject.toml` (#1373) · 5b015890

Saurav Maheshkar authored Dec 17, 2024



* chore: move configs to pyproject.toml

* fix: drop file from CI workflow

* feat: reorder pytest markers

* chore: retain comments

* chore(build): migrate build data to pyproject
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Aarni Koskela <akx@iki.fi>

* chore: move configs to pyproject.toml

* Apply suggestions from code review
Co-authored-by: Aarni Koskela <akx@iki.fi>

* bump ruff

---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Aarni Koskela <akx@iki.fi>

5b015890

05 Dec, 2024 1 commit

LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d

Matthew Douglas authored Dec 05, 2024



* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation

* Fix unintended change

* New naive mm_dequant kernel for row-major; cleanup

* fix

* int8 refactor: initial sparse decomp, cleanup

* Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup

* int8: inference optimizations, some cleanup

* int8: more tests passing, cleanup

* int8 - more cleanup, most tests passing

* int8: specify CUDA stream for int8 ops

* perf: reduce overhead from getting cudaStream ptr

* Mark some functions for deprecation.

* int8 sparse decomp: small perf improvement

* update setup.py

* Update bitsandbytes/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn

* int8 cleanup

* Ignore ruff rule ISC001 (incompatible with formatter)

* add comment

* int8 more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* int8: rename / deprecate old fn signatures

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* type annotation

* format update

* Update bitsandbytes/research/autograd/_functions.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Add comment to explain division optimization

* more cleanup

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update bitsandbytes/functional.py
Co-authored-by: Aarni Koskela <akx@iki.fi>

* cleanup

* Type annotations, cleanup

* remove unused kernels; improved type annotations

* small perf optimization for single-GPU systems

* small perf optimization for single-GPU systems

* update docstrings

* Improve docs and tests

* Update docstring

* Update test

* add benchmarking script

* test cleanup: add deprecated marker, move benchmarks out

* Add int8 dequant function; misc improvements

* int8 matmul fallback for inner dims not divisible by 4

* improve register usage of kInt8VectorQuant - especially for A100/H100

* disable fail-fast for package build

* maxwell compat

* ptxas verbose

* docs update

* doc update

* backward fix

* Bugfix sparse decomp

* Int8 fix for PEFT OLoRA init

* Fix test for deprecated spmm_coo

* test improvement

* doc update

* typo

* doc cleanup

* docs

* add inference benchmark script

* Add benchmarks, doc update

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

81e6345d

02 Dec, 2024 1 commit

[Build] Add CUDA 12.6.2 build; update 12.5.0 to 12.5.1 (#1431) · 7dca7004

Matthew Douglas authored Dec 02, 2024

* [Build] Add CUDA 12.6.2 build; update 12.5.0 to 12.5.1

* bump cuda-toolkit action version

* Update docs for cuda versions

7dca7004

30 Sep, 2024 3 commits
- omit macos wheels for now · d873fb34
  Titus von Koeller authored Sep 30, 2024
  
  d873fb34
- more descriptive continuous release name · 2a1ff2c0
  Titus von Koeller authored Sep 30, 2024
  
  2a1ff2c0
- tweak continuous release of `main` · 4f198988
  Titus von Koeller authored Sep 30, 2024
  
  4f198988
24 Sep, 2024 1 commit

Add workflow to publish tagged releases to PyPI (#1369) · bdf381c8

Matthew Douglas authored Sep 23, 2024

* CI/CD: Add step to publish wheels on tag creation

* Remove file

* Restrict pre-release workflow branches

* Update PyPI publishing

* Update PyPI publishing

* Update package workflow name

* continuous pre-release only on main

bdf381c8

31 Jul, 2024 1 commit
- packaging: bump permissions for continuous release step · 4be18838
  Titus authored Jul 31, 2024
  
  4be18838
29 Jul, 2024 1 commit
- add job to upload wheels to continuous pre-release (#1282) · b64cbe32
  Titus authored Jul 29, 2024
  
  b64cbe32
21 Jul, 2024 1 commit
- Add CUDA 12.5 and update 12.4 builds (#1284) · 0bdd57cc
  Matthew Douglas authored Jul 21, 2024
```
* Add CUDA 12.5 builds and enable CUDA 12.4 on Windows

* Update install doc
```
  0bdd57cc
08 Apr, 2024 2 commits
- Exclude Windows from CUDA 12.4.0 build for now · ebac8625
  Matthew Douglas authored Apr 08, 2024
  
  ebac8625
- Build workflow: Add CUDA 12.4 to build matrix · c0ad874a
  Matthew Douglas authored Apr 08, 2024
  
  c0ad874a
11 Mar, 2024 4 commits
- Add commented-out test step to CI · ce597c63
  Aarni Koskela authored Mar 08, 2024
  
  ce597c63
- Add audit-wheel step · 7af138ab
  Aarni Koskela authored Mar 08, 2024
```
Closes #1114
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
```
  7af138ab
- Move build scripts to .github/scripts (from scripts/ and workflow YAML) · 62485a34
  Aarni Koskela authored Mar 08, 2024
  
  62485a34
- Reformat .github with Prettier · 958dfa99
  Aarni Koskela authored Mar 08, 2024
  
  958dfa99
08 Mar, 2024 1 commit

Build: Expand CUDA Toolkit Matrix (#1111) · 1cfc2777

Matthew Douglas authored Mar 07, 2024



* (ci) build with wider CUDA version matrix

* (ci) build with wider CUDA version matrix

* (ci) skip sm_89 target on CUDA 11.7

* (ci) skip sm_90 target on CUDA 11.8

* modify workflow to publish to test.pypi

* (build) Test for manylinux_2_24 build on GH actions

* (build) got that backwards.

* try fixing manual triggering condition for testpypi

* try if Ubuntu 18.04 is an easy fix to allow for `manylinux_2_24` compatibility

* hardcode publish step to run to test publishing

* set ubuntu to newest supported version

* try statically linking libstdc++ to achieve manylinux_2_18

* last commit only brought us to manylinux_2_34, reverse

* add misssing permission for publishing to pypi

* snake case deprecated in favor of kebab

* downgrade cuda ubuntu aiming for manylinux_2_24

* add step to upgrade cmake due to old Ubuntu for CUDA build

* adjust path to prefer pip installed cmake

* (cmake) set CMAKE_BUILD_TYPE=Release if unspecified

* default to CMAKE_BUILD_TYPE Release for optimized releases and better many_linux compatibility

* (build) back to ubuntu22.04 docker images

* verify Cmake in separte step

* add clarifying comment about Python version compatibility

* (build) we don't need cmake for wheel step

* fixup testpypi publish to run in PR for testing

* add pypi publishing when tagged on main

* add functionality to rewrite platform tags

* (ci) adjust platform tags for wheels

* fix for windows, get order right.

* fix for windows, get order right.

* (build) slim down those fatbins on windows cuda

* sloppy

* remove broken PyPi upload for now

---------
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

1cfc2777

28 Feb, 2024 1 commit
- (ci) update apt repo before aarch64 build tools are installed (#1096) · f9eba9c8
  Matthew Douglas authored Feb 28, 2024
  
  f9eba9c8
27 Feb, 2024 2 commits

Add concurrency to not waste precious build minutes when modifying PRs frequently. (#1051) · 1d709aad
Rickard authored Feb 27, 2024
```
Co-authored-by: wkpark <wkpark@gmail.com>
```
1d709aad

(cmake) Fix cuda arch selection (#1091) · 753df25c

Matthew Douglas authored Feb 27, 2024

* (cmake) Fix generation of targets for nvcc

* Typo

* (ci) linux + CUDA workflow: make sure we specify target architectures

* fix

* fix one more time

* (cmake) Default in CMAKE_CUDA_ARCHITECTURES_ALL when cmake<3.23, make sure we build only selected cubins and only ptx for latest capability

* Fix static lookup for CMAKE_CUDA_ARCHITECTURES_ALL on cmake<3.23

* Remove debug setting

* clarification

753df25c

19 Feb, 2024 1 commit
- Fix cross compilation on linux (#1050) · c5e43637
  Rickard authored Feb 19, 2024
  
  c5e43637
14 Feb, 2024 1 commit

CI: Fix cuda toolkit speed issue. (#1055) · 344e8516

Won-Kyu Park authored Feb 15, 2024



* CI: fix cuda-toolkit speed issue

* CI: use MSVC instead msbuild to remove 'visual_stuido_integration' dependency

 * use Ninja to compile without MS toolset

* use 'network', install 'ninja' only
Co-authored-by: Rickard <rickardp@users.noreply.github.com>

---------
Co-authored-by: Rickard <rickardp@users.noreply.github.com>

344e8516

07 Feb, 2024 1 commit
- Skip checkout nvidia cub (#1053) · 136721a8
  Rickard authored Feb 08, 2024
  
  136721a8
05 Feb, 2024 1 commit

Make native code portable and add GitHub workflow for building (#949) · 73d3e7b6

Rickard authored Feb 05, 2024



* Make native code portable and add GitHub workflow for building

* Removed deprecated Python versions

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml
Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update python-package.yml

* Do not test on Python 3.13 until released

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Update python-package.yml

* Refactor build stage

* Fixed breaking actions change

* Slim down Windows cuda

* Create dependabot.yml

* Bespoke local dev requirements.txt

* Enable VS integration

* Group Dependabot updates

* Cleanup

* Update python-package.yml

* Reinstate file that was wrongly merged

* Fixed regression caused by new version of download-artifact

* Update python-package.yml

* Update python-package.yml

* Fix matrix

* Update python-package.yml

* Merge

* Pipeline

* Fixed conflict

* Fixed conflict

* Update CMakeLists.txt

* Fixed merge error

* cleanup

* cleanup

* Find CUDA

* Fix

* Fixing merge error from latest merge from main

* Fix setup.py

* Fixed typo in artifact name

* Remove linker flags

* Build nocublaslt versions

* Fixed formatting

* Fixed VS Code format on save

* Ran format on save from VScode

* Re-saved the json files using the new settings

* Re-saved CMakeLists.txt to get formatting right

* Add path filter

* Formatting

---------
Co-authored-by: Aarni Koskela <akx@iki.fi>

73d3e7b6