- 14 Jan, 2025 1 commit
-
-
Matthew Douglas authored
* (chore) Remove unused dotfiles * cleanup: remove unused kernels/C++ code
-
- 05 Dec, 2024 1 commit
-
-
Matthew Douglas authored
* Start of int8 refactor: remove col32/col_ampere/col_turing transforms in new igemmlt implementation * Fix unintended change * New naive mm_dequant kernel for row-major; cleanup * fix * int8 refactor: initial sparse decomp, cleanup * Int8 refactoring: remove separate NO_CUBLASLT build; more cleanup * int8: inference optimizations, some cleanup * int8: more tests passing, cleanup * int8 - more cleanup, most tests passing * int8: specify CUDA stream for int8 ops * perf: reduce overhead from getting cudaStream ptr * Mark some functions for deprecation. * int8 sparse decomp: small perf improvement * update setup.py * Update bitsandbytes/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8 - perf improvement for sparse decomposition inference; deprecate get_tensor_stream() in favor of new private fn * int8 cleanup * Ignore ruff rule ISC001 (incompatible with formatter) * add comment * int8 more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * int8: rename / deprecate old fn signatures * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * type annotation * format update * Update bitsandbytes/research/autograd/_functions.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Add comment to explain division optimization * more cleanup * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update bitsandbytes/functional.py Co-authored-by:
Aarni Koskela <akx@iki.fi> * cleanup * Type annotations, cleanup * remove unused kernels; improved type annotations * small perf optimization for single-GPU systems * small perf optimization for single-GPU systems * update docstrings * Improve docs and tests * Update docstring * Update test * add benchmarking script * test cleanup: add deprecated marker, move benchmarks out * Add int8 dequant function; misc improvements * int8 matmul fallback for inner dims not divisible by 4 * improve register usage of kInt8VectorQuant - especially for A100/H100 * disable fail-fast for package build * maxwell compat * ptxas verbose * docs update * doc update * backward fix * Bugfix sparse decomp * Int8 fix for PEFT OLoRA init * Fix test for deprecated spmm_coo * test improvement * doc update * typo * doc cleanup * docs * add inference benchmark script * Add benchmarks, doc update --------- Co-authored-by:
Aarni Koskela <akx@iki.fi>
-
- 23 Oct, 2024 1 commit
-
-
Aarni Koskela authored
* Update pre-commit tools * Fix typos
-
- 20 Sep, 2024 2 commits
-
-
Matthew Douglas authored
* Change 8bit optimizer blocksize 2048->256; additional bf16 support * Update tolerances for 8bit optimizer tests
-
Matthew Douglas authored
* Add AdEMAMix optimizer * Add PagedAdEMAMix32bit, AdEMAMix32bit * Add PagedAdEMAMix32bit, AdEMAMix32bit * AdEMAMix: add support for alpha/beta3 scheduling * Update paged AdEMAMix
-
- 26 Aug, 2024 1 commit
-
-
Abhilash Majumder authored
* remove kcompress * fix initial template call * fix function name * remove vector load * cleanup reduce & rearrange * format
-
- 22 Aug, 2024 1 commit
-
-
Jee Jee Li authored
* Done * fix format * fix format * fix format * fix format * Address format error and fix default arg bug * Refine stream argument passing mechanism * Fix bug * Delete unused code
-
- 12 Jul, 2024 1 commit
-
-
Markus Hennerbichler authored
pythonInterface.cpp depends on ops.cuh which in turn depends on some thrust headers. It is defined as a C++ compilation unit which is problematic becuase thrift doesn't guarantee compatibility with a host compiler. This is starting to cause issues with CUDA 12.5. There is no dependency on the thrust headers, which means they can be removed without other consequences.
-
- 29 Mar, 2024 1 commit
-
-
Matthew Douglas authored
-
- 23 Feb, 2024 1 commit
-
-
Titus von Koeller authored
-
- 14 Feb, 2024 1 commit
-
-
pnunna93 authored
-
- 05 Feb, 2024 3 commits
-
-
Rickard authored
* Make native code portable and add GitHub workflow for building * Removed deprecated Python versions * Update python-package.yml Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update python-package.yml Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update python-package.yml Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update python-package.yml Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update python-package.yml Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update python-package.yml Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update python-package.yml Co-authored-by:
Aarni Koskela <akx@iki.fi> * Update python-package.yml * Do not test on Python 3.13 until released * Update python-package.yml * Update python-package.yml * Update python-package.yml * Update python-package.yml * Refactor build stage * Fixed breaking actions change * Slim down Windows cuda * Create dependabot.yml * Bespoke local dev requirements.txt * Enable VS integration * Group Dependabot updates * Cleanup * Update python-package.yml * Reinstate file that was wrongly merged * Fixed regression caused by new version of download-artifact * Update python-package.yml * Update python-package.yml * Fix matrix * Update python-package.yml * Merge * Pipeline * Fixed conflict * Fixed conflict * Update CMakeLists.txt * Fixed merge error * cleanup * cleanup * Find CUDA * Fix * Fixing merge error from latest merge from main * Fix setup.py * Fixed typo in artifact name * Remove linker flags * Build nocublaslt versions * Fixed formatting * Fixed VS Code format on save * Ran format on save from VScode * Re-saved the json files using the new settings * Re-saved CMakeLists.txt to get formatting right * Add path filter * Formatting --------- Co-authored-by:
Aarni Koskela <akx@iki.fi>
-
Rickard authored
-
Aarni Koskela authored
Co-authored-by:Titus von Koeller <titus@vonkoeller.com> fix erroneous correction
-
- 01 Feb, 2024 1 commit
-
-
Aarni Koskela authored
-
- 31 Jan, 2024 1 commit
-
-
James Wyatt authored
based on @Jamezo97 and @acpopescu work manually cherry-picked from PR #788 and PR #229 and cleanup by wkpark Signed-off-by:Won-Kyu Park <wkpark@gmail.com>
-
- 30 Jan, 2024 1 commit
-
-
Aarni Koskela authored
-
- 09 Dec, 2023 1 commit
-
-
修艺 authored
-
- 19 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 17 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 11 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 10 Jul, 2023 5 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
Tim Dettmers authored
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 09 Jul, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 08 Jul, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 05 Jul, 2023 1 commit
-
-
Tim Dettmers authored
-
- 04 Jul, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 31 May, 2023 3 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-
Tim Dettmers authored
-
- 24 May, 2023 1 commit
-
-
Tim Dettmers authored
-
- 06 May, 2023 1 commit
-
-
Tim Dettmers authored
-
- 02 May, 2023 2 commits
-
-
Tim Dettmers authored
-
Tim Dettmers authored
-