Bumped version for new release.

a06a0f6a · Tim Dettmers · 412fd0e7 · a06a0f6a · a06a0f6a
Commit a06a0f6a authored Jul 22, 2023 by Tim Dettmers
Hide whitespace changes
Inline Side-by-side

Showing with 28 additions and 1 deletion

CHANGELOG.md CHANGELOG.md +27 -0

setup.py setup.py +1 -1

No files found.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -283,3 +283,30 @@ Bug fixes:
 - Removed outdated get_cuda_lib_handle calls that lead to errors. #595 Thank you @ihsanturk
 - Fixed bug where read-permission was assumed for a file. #497
 - Fixed a bug where prefetchAsync lead to errors on GPUs that do not support unified memory but not prefetching (Maxwell, SM52). #470 #451 #453 #477 Thank you @jllllll and @stoperro
+### 0.41.0
+Features:
+ - Added precompiled CUDA 11.8 binaries to support H100 GPUs without compilation #571
+ - CUDA SETUP now no longer looks for libcuda and libcudart and relies PyTorch CUDA libraries. To manually override this behavior see: how_to_use_nonpytorch_cuda.md. Thank you @rapsealk
+Bug fixes:
+ - Fixed a bug where the default type of absmax was undefined which leads to errors if the default type is different than torch.float32. # 553
+ - Fixed a missing scipy dependency in requirements.txt. #544
+ - Fixed a bug, where a view operation could cause an error in 8-bit layers.
+ - Fixed a bug where CPU bitsandbytes would during the import. #593 Thank you @bilelomrani
+ - Fixed a but where a non-existent LD_LIBRARY_PATH variable led to a failure in python -m bitsandbytes #588
+ - Removed outdated get_cuda_lib_handle calls that lead to errors. #595 Thank you @ihsanturk
+ - Fixed bug where read-permission was assumed for a file. #497
+ - Fixed a bug where prefetchAsync lead to errors on GPUs that do not support unified memory but not prefetching (Maxwell, SM52). #470 #451 #453 #477 Thank you @jllllll and @stoperro
+Documentation:
+ - Improved documentation for GPUs that do not support 8-bit matmul. #529
+ - Added description and pointers for the NF4 data type. #543
+User experience:
+ - Improved handling of default compute_dtype for Linear4bit Layers, so that compute_dtype = input_dtype if the input data type is stable enough (float32, bfloat16, but not float16).
+Performance:
+ - improved 4-bit inference performance for A100 GPUs. This degraded performance for A40/RTX3090 and RTX 4090 GPUs slightly.
--- a/setup.py
+++ b/setup.py
@@ -18,7 +18,7 @@ def read(fname):
 setup(
    name=f"bitsandbytes",
-    version=f"0.40.2",
+    version=f"0.41.0",
    author="Tim Dettmers",
    author_email="dettmers@cs.washington.edu",
    description="k-bit optimizers and matrix multiplication routines.",