1. 23 Oct, 2025 1 commit
  2. 24 Sep, 2025 1 commit
    • pnunna93's avatar
      Fix for warpSize deprecation in ROCm 7.0 (#1762) · b72b766e
      pnunna93 authored
      
      
      * Port ROCm changes from multi-backend-refactor branch
      
      * Update ops.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update test_ops.py
      
      * Update test_functional.py
      
      * Update test_ops.py
      
      * Update test_functional.py
      
      * Update test_functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update test_functional.py
      
      * Update test_functional.py
      
      * Update cextension.py
      
      * Update cuda_specs.py
      
      * Update cuda_specs.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_cuda_setup_evaluator.py
      
      * Update test_functional.py
      
      * Update modules.py
      
      * Update modules.py
      
      * Update ops.py
      
      * Update test_linear4bit.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update test_linear4bit.py
      
      * Update test_linear4bit.py
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Create build-rocm.sh
      
      * Update cuda_specs.py
      
      * Fix trailing whitespace
      
      * Remove conflicts.diff
      
      * update for hipblasVersionMajor >=3
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Update main.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Update test_linear4bit.py
      
      * Lint
      
      * Lint
      
      * Update helpers.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Lint
      
      * Update pythonInterface.cpp
      
      * lint fix
      
      * lint
      
      * Update pythonInterface.cpp
      
      * revert permissions change
      
      * Fix indentation
      
      * Update kernels_hip.cuh
      
      * Update kernels.hip
      
      * Update ops.hip
      
      * Update ops_hip.cuh
      
      * Update kernels_hip.cuh
      
      * Update kernels.hip
      
      * Update kernels.hip
      
      * Update ops.hip
      
      * Update ops_hip.cuh
      
      * Update ops.hip
      
      * Update CMakeLists.txt
      
      * Update functional.py
      
      * Update cextension.py
      
      * Update cextension.py
      
      * warpSize is being made non constexpr in ROCm 7.0
      
      * Merge pull request #90 from ROCm/IFU-rocm_enabled-09-23-2025
      
      Ifu rocm enabled 09 23 2025
      
      * Fix typo
      
      * unskip test_4bit_quant
      
      ---------
      Co-authored-by: default avatarMISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com>
      Co-authored-by: default avatarMISHANMAUYRA <mishanmaurya31081@gmail.com>
      Co-authored-by: default avataramcamd <andrew.chapman@amd.com>
      Co-authored-by: default avatarPrasanth Nunna <root@banff-cyxtera-s78-1.amd.com>
      Co-authored-by: default avatarsstamenk <strahinja.stamenkovic@amd.com>
      b72b766e
  3. 23 Sep, 2025 1 commit
    • Matthew Douglas's avatar
      Add CUDA 13.0 Support (#1761) · bdb8b2b7
      Matthew Douglas authored
      * CUDA 13 build enablement
      
      * Try to fix Windows build workflow
      
      * Add torch 2.9+cu130 to tests
      
      * Fix python version
      
      * Update test workflow
      
      * Don't test CPU on torch 2.9 yet
      
      * Update doc
      bdb8b2b7
  4. 15 Sep, 2025 1 commit
  5. 20 Jun, 2025 1 commit
    • pnunna93's avatar
      Enable ROCm backend with custom ops integration (#1683) · 888788d7
      pnunna93 authored
      
      
      * Port ROCm changes from multi-backend-refactor branch
      
      * Update ops.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update test_ops.py
      
      * Update test_functional.py
      
      * Update test_ops.py
      
      * Update test_functional.py
      
      * Update test_functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update test_functional.py
      
      * Update test_functional.py
      
      * Update cextension.py
      
      * Update cuda_specs.py
      
      * Update cuda_specs.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_cuda_setup_evaluator.py
      
      * Update test_functional.py
      
      * Update modules.py
      
      * Update modules.py
      
      * Update ops.py
      
      * Update test_linear4bit.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update test_linear4bit.py
      
      * Update test_linear4bit.py
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Create build-rocm.sh
      
      * Update cuda_specs.py
      
      * Fix trailing whitespace
      
      * Remove conflicts.diff
      
      * update for hipblasVersionMajor >=3
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Update main.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Update test_linear4bit.py
      
      * Lint
      
      * Lint
      
      * Update helpers.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Lint
      
      * Update pythonInterface.cpp
      
      * lint fix
      
      * lint
      
      * Update pythonInterface.cpp
      
      * revert permissions change
      
      * Fix indentation
      
      * Update kernels_hip.cuh
      
      * Update kernels.hip
      
      * Update ops.hip
      
      * Update ops_hip.cuh
      
      * Update kernels_hip.cuh
      
      * Update kernels.hip
      
      * Update kernels.hip
      
      * Update ops.hip
      
      * Update ops_hip.cuh
      
      * Update ops.hip
      
      * Update CMakeLists.txt
      
      * Update functional.py
      
      * Update cextension.py
      
      * Update cextension.py
      
      ---------
      Co-authored-by: default avatarMISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com>
      Co-authored-by: default avatarMISHANMAUYRA <mishanmaurya31081@gmail.com>
      Co-authored-by: default avataramcamd <andrew.chapman@amd.com>
      Co-authored-by: default avatarPrasanth Nunna <root@banff-cyxtera-s78-1.amd.com>
      888788d7
  6. 22 Apr, 2025 1 commit
  7. 22 Jan, 2025 1 commit
  8. 05 Dec, 2024 1 commit
    • Matthew Douglas's avatar
      LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d
      Matthew Douglas authored
      
      
      * Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation
      
      * Fix unintended change
      
      * New naive mm_dequant kernel for row-major; cleanup
      
      * fix
      
      * int8 refactor: initial sparse decomp, cleanup
      
      * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup
      
      * int8: inference optimizations, some cleanup
      
      * int8: more tests passing, cleanup
      
      * int8 - more cleanup, most tests passing
      
      * int8: specify CUDA stream for int8 ops
      
      * perf: reduce overhead from getting cudaStream ptr
      
      * Mark some functions for deprecation.
      
      * int8 sparse decomp: small perf improvement
      
      * update setup.py
      
      * Update bitsandbytes/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn
      
      * int8 cleanup
      
      * Ignore ruff rule ISC001 (incompatible with formatter)
      
      * add comment
      
      * int8 more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8: rename / deprecate old fn signatures
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * type annotation
      
      * format update
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Add comment to explain division optimization
      
      * more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Type annotations, cleanup
      
      * remove unused kernels; improved type annotations
      
      * small perf optimization for single-GPU systems
      
      * small perf optimization for single-GPU systems
      
      * update docstrings
      
      * Improve docs and tests
      
      * Update docstring
      
      * Update test
      
      * add benchmarking script
      
      * test cleanup: add deprecated marker, move benchmarks out
      
      * Add int8 dequant function; misc improvements
      
      * int8 matmul fallback for inner dims not divisible by 4
      
      * improve register usage of kInt8VectorQuant - especially for A100/H100
      
      * disable fail-fast for package build
      
      * maxwell compat
      
      * ptxas verbose
      
      * docs update
      
      * doc update
      
      * backward fix
      
      * Bugfix sparse decomp
      
      * Int8 fix for PEFT OLoRA init
      
      * Fix test for deprecated spmm_coo
      
      * test improvement
      
      * doc update
      
      * typo
      
      * doc cleanup
      
      * docs
      
      * add inference benchmark script
      
      * Add benchmarks, doc update
      
      ---------
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      81e6345d
  9. 09 Sep, 2024 1 commit
  10. 15 Jul, 2024 1 commit
  11. 08 Mar, 2024 1 commit
    • Matthew Douglas's avatar
      Build: Expand CUDA Toolkit Matrix (#1111) · 1cfc2777
      Matthew Douglas authored
      
      
      * (ci) build with wider CUDA version matrix
      
      * (ci) build with wider CUDA version matrix
      
      * (ci) skip sm_89 target on CUDA 11.7
      
      * (ci) skip sm_90 target on CUDA 11.8
      
      * modify workflow to publish to test.pypi
      
      * (build) Test for manylinux_2_24 build on GH actions
      
      * (build) got that backwards.
      
      * try fixing manual triggering condition for testpypi
      
      * try if Ubuntu 18.04 is an easy fix to allow for `manylinux_2_24` compatibility
      
      * hardcode publish step to run to test publishing
      
      * set ubuntu to newest supported version
      
      * try statically linking libstdc++ to achieve manylinux_2_18
      
      * last commit only brought us to manylinux_2_34, reverse
      
      * add misssing permission for publishing to pypi
      
      * snake case deprecated in favor of kebab
      
      * downgrade cuda ubuntu aiming for manylinux_2_24
      
      * add step to upgrade cmake due to old Ubuntu for CUDA build
      
      * adjust path to prefer pip installed cmake
      
      * (cmake) set CMAKE_BUILD_TYPE=Release if unspecified
      
      * default to CMAKE_BUILD_TYPE Release for optimized releases and better many_linux compatibility
      
      * (build) back to ubuntu22.04 docker images
      
      * verify Cmake in separte step
      
      * add clarifying comment about Python version compatibility
      
      * (build) we don't need cmake for wheel step
      
      * fixup testpypi publish to run in PR for testing
      
      * add pypi publishing when tagged on main
      
      * add functionality to rewrite platform tags
      
      * (ci) adjust platform tags for wheels
      
      * fix for windows, get order right.
      
      * fix for windows, get order right.
      
      * (build) slim down those fatbins on windows cuda
      
      * sloppy
      
      * remove broken PyPi upload for now
      
      ---------
      Co-authored-by: default avatarTitus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
      1cfc2777
  12. 27 Feb, 2024 2 commits
    • Matthew Douglas's avatar
      cc5f8cd8
    • Matthew Douglas's avatar
      (cmake) Fix cuda arch selection (#1091) · 753df25c
      Matthew Douglas authored
      * (cmake) Fix generation of targets for nvcc
      
      * Typo
      
      * (ci) linux + CUDA workflow: make sure we specify target architectures
      
      * fix
      
      * fix one more time
      
      * (cmake) Default in CMAKE_CUDA_ARCHITECTURES_ALL when cmake<3.23, make sure we build only selected cubins and only ptx for latest capability
      
      * Fix static lookup for CMAKE_CUDA_ARCHITECTURES_ALL on cmake<3.23
      
      * Remove debug setting
      
      * clarification
      753df25c
  13. 06 Feb, 2024 1 commit
  14. 05 Feb, 2024 1 commit
    • Rickard's avatar
      Make native code portable and add GitHub workflow for building (#949) · 73d3e7b6
      Rickard authored
      
      
      * Make native code portable and add GitHub workflow for building
      
      * Removed deprecated Python versions
      
      * Update python-package.yml
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update python-package.yml
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update python-package.yml
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update python-package.yml
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update python-package.yml
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update python-package.yml
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update python-package.yml
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update python-package.yml
      
      * Do not test on Python 3.13 until released
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Refactor build stage
      
      * Fixed breaking actions change
      
      * Slim down Windows cuda
      
      * Create dependabot.yml
      
      * Bespoke local dev requirements.txt
      
      * Enable VS integration
      
      * Group Dependabot updates
      
      * Cleanup
      
      * Update python-package.yml
      
      * Reinstate file that was wrongly merged
      
      * Fixed regression caused by new version of download-artifact
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Fix matrix
      
      * Update python-package.yml
      
      * Merge
      
      * Pipeline
      
      * Fixed conflict
      
      * Fixed conflict
      
      * Update CMakeLists.txt
      
      * Fixed merge error
      
      * cleanup
      
      * cleanup
      
      * Find CUDA
      
      * Fix
      
      * Fixing merge error from latest merge from main
      
      * Fix setup.py
      
      * Fixed typo in artifact name
      
      * Remove linker flags
      
      * Build nocublaslt versions
      
      * Fixed formatting
      
      * Fixed VS Code format on save
      
      * Ran format on save from VScode
      
      * Re-saved the json files using the new settings
      
      * Re-saved CMakeLists.txt to get formatting right
      
      * Add path filter
      
      * Formatting
      
      ---------
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      73d3e7b6
  15. 01 Feb, 2024 1 commit