1. 27 Jun, 2025 1 commit
    • Matthew Douglas's avatar
      Add CUDA 12.9 build (#1689) · 1abd5e78
      Matthew Douglas authored
      * Add CUDA 12.9 to build/test workflows
      
      * Downgrade Jimver/cuda-toolkit to v0.2.24
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update tests.yml
      
      * Update tests.yml
      1abd5e78
  2. 20 Jun, 2025 1 commit
    • pnunna93's avatar
      Enable ROCm backend with custom ops integration (#1683) · 888788d7
      pnunna93 authored
      
      
      * Port ROCm changes from multi-backend-refactor branch
      
      * Update ops.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update test_ops.py
      
      * Update test_functional.py
      
      * Update test_ops.py
      
      * Update test_functional.py
      
      * Update test_functional.py
      
      * Update functional.py
      
      * Update functional.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update test_functional.py
      
      * Update test_functional.py
      
      * Update cextension.py
      
      * Update cuda_specs.py
      
      * Update cuda_specs.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_cuda_setup_evaluator.py
      
      * Update test_functional.py
      
      * Update modules.py
      
      * Update modules.py
      
      * Update ops.py
      
      * Update test_linear4bit.py
      
      * Update ops.py
      
      * Update ops.py
      
      * Update test_linear4bit.py
      
      * Update test_linear4bit.py
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Update python-package.yml
      
      * Create build-rocm.sh
      
      * Update cuda_specs.py
      
      * Fix trailing whitespace
      
      * Remove conflicts.diff
      
      * update for hipblasVersionMajor >=3
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Update main.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Update test_linear4bit.py
      
      * Lint
      
      * Lint
      
      * Update helpers.py
      
      * Update test_functional.py
      
      * Update test_linear4bit.py
      
      * Update test_ops.py
      
      * Lint
      
      * Update pythonInterface.cpp
      
      * lint fix
      
      * lint
      
      * Update pythonInterface.cpp
      
      * revert permissions change
      
      * Fix indentation
      
      * Update kernels_hip.cuh
      
      * Update kernels.hip
      
      * Update ops.hip
      
      * Update ops_hip.cuh
      
      * Update kernels_hip.cuh
      
      * Update kernels.hip
      
      * Update kernels.hip
      
      * Update ops.hip
      
      * Update ops_hip.cuh
      
      * Update ops.hip
      
      * Update CMakeLists.txt
      
      * Update functional.py
      
      * Update cextension.py
      
      * Update cextension.py
      
      ---------
      Co-authored-by: default avatarMISHANMAURYA <118961433+MISHANMAURYA@users.noreply.github.com>
      Co-authored-by: default avatarMISHANMAUYRA <mishanmaurya31081@gmail.com>
      Co-authored-by: default avataramcamd <andrew.chapman@amd.com>
      Co-authored-by: default avatarPrasanth Nunna <root@banff-cyxtera-s78-1.amd.com>
      888788d7
  3. 13 May, 2025 1 commit
  4. 02 May, 2025 1 commit
    • Matthew Douglas's avatar
      Linux aarch64 CI updates (#1622) · 49c044b1
      Matthew Douglas authored
      * Add aarch64 cpu tests and CUDA build to nightly workflow
      
      * aarch64: limit CUDA targets to sm75, sm80, sm90, sm100
      
      * aarch64: limit CUDA targets to sm75, sm80, sm90, sm100
      
      * Update build cpu script
      
      * fix
      
      * Update auditwheel for aarch64
      49c044b1
  5. 29 Apr, 2025 1 commit
    • Matthew Douglas's avatar
      Set up nightly CI for unit tests (#1619) · a5dd01bb
      Matthew Douglas authored
      * Run unit tests on GH Actions
      
      * fix
      
      * fix
      
      * trigger workflow
      
      * Update
      
      * Update
      
      * Update
      
      * Run tests nightly
      
      * Disable paged optimizer test on Windows
      
      * Skip unit tests on Windows for CUDA 12.x (driver on runner is too old)
      a5dd01bb
  6. 22 Apr, 2025 1 commit
  7. 24 Feb, 2025 2 commits
  8. 23 Jan, 2025 1 commit
  9. 22 Jan, 2025 1 commit
  10. 05 Dec, 2024 1 commit
    • Matthew Douglas's avatar
      LLM.int8() Refactoring: Part 1 (#1401) · 81e6345d
      Matthew Douglas authored
      
      
      * Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation
      
      * Fix unintended change
      
      * New naive mm_dequant kernel for row-major; cleanup
      
      * fix
      
      * int8 refactor: initial sparse decomp, cleanup
      
      * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup
      
      * int8: inference optimizations, some cleanup
      
      * int8: more tests passing, cleanup
      
      * int8 - more cleanup, most tests passing
      
      * int8: specify CUDA stream for int8 ops
      
      * perf: reduce overhead from getting cudaStream ptr
      
      * Mark some functions for deprecation.
      
      * int8 sparse decomp: small perf improvement
      
      * update setup.py
      
      * Update bitsandbytes/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn
      
      * int8 cleanup
      
      * Ignore ruff rule ISC001 (incompatible with formatter)
      
      * add comment
      
      * int8 more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * int8: rename / deprecate old fn signatures
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * type annotation
      
      * format update
      
      * Update bitsandbytes/research/autograd/_functions.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Add comment to explain division optimization
      
      * more cleanup
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * Update bitsandbytes/functional.py
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      
      * cleanup
      
      * Type annotations, cleanup
      
      * remove unused kernels; improved type annotations
      
      * small perf optimization for single-GPU systems
      
      * small perf optimization for single-GPU systems
      
      * update docstrings
      
      * Improve docs and tests
      
      * Update docstring
      
      * Update test
      
      * add benchmarking script
      
      * test cleanup: add deprecated marker, move benchmarks out
      
      * Add int8 dequant function; misc improvements
      
      * int8 matmul fallback for inner dims not divisible by 4
      
      * improve register usage of kInt8VectorQuant - especially for A100/H100
      
      * disable fail-fast for package build
      
      * maxwell compat
      
      * ptxas verbose
      
      * docs update
      
      * doc update
      
      * backward fix
      
      * Bugfix sparse decomp
      
      * Int8 fix for PEFT OLoRA init
      
      * Fix test for deprecated spmm_coo
      
      * test improvement
      
      * doc update
      
      * typo
      
      * doc cleanup
      
      * docs
      
      * add inference benchmark script
      
      * Add benchmarks, doc update
      
      ---------
      Co-authored-by: default avatarAarni Koskela <akx@iki.fi>
      81e6345d
  11. 13 Mar, 2024 1 commit
  12. 11 Mar, 2024 2 commits