Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
bitsandbytes
Commits
3901ebf7
Commit
3901ebf7
authored
Jan 04, 2023
by
Tim Dettmers
Browse files
Added CUDA 12.0 support; removed CC 3.0 support.
parent
b3de1921
Changes
6
Show whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
92 additions
and
22 deletions
+92
-22
CHANGELOG.md
CHANGELOG.md
+35
-1
Makefile
Makefile
+28
-16
README.md
README.md
+2
-2
cuda_install.sh
cuda_install.sh
+4
-0
deploy.sh
deploy.sh
+22
-2
setup.py
setup.py
+1
-1
No files found.
CHANGELOG.md
View file @
3901ebf7
...
...
@@ -139,7 +139,6 @@ Features:
Bug fixes:
-
Fixed a problem where warning messages would be displayed even though everything worked correctly.
### 0.35.2
Bug fixes:
...
...
@@ -155,3 +154,38 @@ Bug fixes:
Bug fixes:
-
Fixed a bug in the CUDA Setup failed with the cuda runtime was found, but not the cuda library.
-
Fixed a bug where not finding the cuda runtime led to an incomprehensible error.
### 0.36.0
#### Improvements, Ada/Hopper support, fake k-bit quantization.
Features:
-
CUDA 11.8 and 12.0 support added
-
support for Ada and Hopper GPUs added (compute capability 8.9 and 9.0)
-
support for fake k-bit block-wise quantization for Int, Float, quantile quantization, and dynamic exponent data types added
-
Added CUDA instruction generator to fix some installations.
-
Added additional block sizes for quantization {64, 128, 256, 512, 1024}
-
Added SRAM Quantile algorithm to quickly estimate less than 256 quantiles
-
Added option to suppress the bitsandbytes welcome message (@Cyberes)
Regression:
-
Compute capability 3.0 removed: GTX 600s and 700s series is no longer supported (except GTX 780 and GTX 780 Ti)
Bug fixes:
-
fixed a bug where too long directory names would crash the CUDA SETUP #35 (@tomaarsen)
-
fixed a bug where CPU installations on Colab would run into an error #34 (@tomaarsen)
-
fixed an issue where the default CUDA version with fast-DreamBooth was not supported #52
-
fixed a bug where the CUDA setup failed due to a wrong function call.
-
fixed a bug in the CUDA Setup which led to an incomprehensible error if no GPU was detected.
-
fixed a bug in the CUDA Setup failed with the cuda runtime was found, but not the cuda library.
-
fixed a bug where not finding the cuda runtime led to an incomprehensible error.
-
fixed a bug where with missing CUDA the default was an error instead of the loading the CPU library
-
fixed a bug where the CC version of the GPU was not detected appropriately (@BlackHC)
-
fixed a bug in CPU quantization which lead to errors when the input buffer exceeded 2^31 elements
Improvements:
-
multiple improvements in formatting, removal of unused imports, and slight performance improvements (@tomaarsen)
-
StableEmbedding layer now has device and dtype parameters to make it 1:1 replaceable with regular Embedding layers (@lostmsu)
-
runtime performance of block-wise quantization slightly improved
-
added error message for the case multiple libcudart.so are installed and bitsandbytes picks the wrong one
Makefile
View file @
3901ebf7
...
...
@@ -22,12 +22,11 @@ BUILD_DIR:= $(ROOT_DIR)/build
FILES_CUDA
:=
$(CSRC)
/ops.cu
$(CSRC)
/kernels.cu
FILES_CPP
:=
$(CSRC)
/common.cpp
$(CSRC)
/cpu_ops.cpp
$(CSRC)
/pythonInterface.c
INCLUDE
:=
-I
$(CUDA_HOME)
/include
-I
$(ROOT_DIR)
/csrc
-I
$(CONDA_PREFIX)
/include
-I
$(ROOT_DIR)
/dependencies/cub
-I
$(ROOT_DIR)
/include
INCLUDE
:=
-I
$(CUDA_HOME)
/include
-I
$(ROOT_DIR)
/csrc
-I
$(CONDA_PREFIX)
/include
-I
$(ROOT_DIR)
/include
INCLUDE_10x
:=
-I
$(CUDA_HOME)
/include
-I
$(ROOT_DIR)
/csrc
-I
$(ROOT_DIR)
/dependencies/cub
-I
$(ROOT_DIR)
/include
LIB
:=
-L
$(CUDA_HOME)
/lib64
-lcudart
-lcublas
-lcublasLt
-lcurand
-lcusparse
-L
$(CONDA_PREFIX)
/lib
# NVIDIA NVCC compilation flags
COMPUTE_CAPABILITY
:=
-gencode
arch
=
compute_35,code
=
sm_35
# Kepler
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_37,code
=
sm_37
# Kepler
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_50,code
=
sm_50
# Maxwell
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_52,code
=
sm_52
# Maxwell
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_60,code
=
sm_60
# Pascal
...
...
@@ -35,11 +34,10 @@ COMPUTE_CAPABILITY += -gencode arch=compute_61,code=sm_61 # Pascal
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_70,code
=
sm_70
# Volta
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_72,code
=
sm_72
# Volta
# CUDA 9.2 supports CC 3.0, but CUDA >= 11.0 does not
CC_
CUDA92
:
=
-gencode
arch
=
compute_3
0
,code
=
sm_3
0
CC_KEPLER
:=
-gencode
arch
=
compute_35,code
=
sm_35
# Kepler
CC_
KEPLER
+
=
-gencode
arch
=
compute_3
7
,code
=
sm_3
7
# Kepler
# Later versions of CUDA support the new architectures
CC_CUDA10x
:=
-gencode
arch
=
compute_30,code
=
sm_30
CC_CUDA10x
+=
-gencode
arch
=
compute_75,code
=
sm_75
CC_CUDA110
:=
-gencode
arch
=
compute_75,code
=
sm_75
...
...
@@ -49,6 +47,7 @@ CC_CUDA11x := -gencode arch=compute_75,code=sm_75
CC_CUDA11x
+=
-gencode
arch
=
compute_80,code
=
sm_80
CC_CUDA11x
+=
-gencode
arch
=
compute_86,code
=
sm_86
CC_cublasLt110
:=
-gencode
arch
=
compute_75,code
=
sm_75
CC_cublasLt110
+=
-gencode
arch
=
compute_80,code
=
sm_80
...
...
@@ -56,30 +55,38 @@ CC_cublasLt111 := -gencode arch=compute_75,code=sm_75
CC_cublasLt111
+=
-gencode
arch
=
compute_80,code
=
sm_80
CC_cublasLt111
+=
-gencode
arch
=
compute_86,code
=
sm_86
CC_ADA_HOPPER
:=
-gencode
arch
=
compute_89,code
=
sm_89
CC_ADA_HOPPER
+=
-gencode
arch
=
compute_90,code
=
sm_90
all
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
$(NVCC)
$(COMPUTE_CAPABILITY)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
.so
$(LIB)
cuda92
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA92)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA92)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA92)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA92)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda10x_nomatmul
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA10x)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA10x)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA10x)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE
_10x
)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA10x)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda110_nomatmul
:
$(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA110)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA110)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA110)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA110)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda11x_nomatmul
:
$(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda12x_nomatmul
:
$(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
$(CC_ADA_HOPPER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
$(CC_ADA_HOPPER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda110
:
$(BUILD_DIR) env
...
...
@@ -92,6 +99,11 @@ cuda11x: $(BUILD_DIR) env
$(NVCC)
$(CC_cublasLt111)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
.so
$(LIB)
cuda12x
:
$(BUILD_DIR) env
$(NVCC)
$(CC_cublasLt111)
$(CC_ADA_HOPPER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
$(NVCC)
$(CC_cublasLt111)
$(CC_ADA_HOPPER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
.so
$(LIB)
cpuonly
:
$(BUILD_DIR) env
$(GPP)
-std
=
c++14
-shared
-fPIC
-I
$(ROOT_DIR)
/csrc
-I
$(ROOT_DIR)
/include
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cpu.so
...
...
README.md
View file @
3901ebf7
...
...
@@ -50,9 +50,9 @@ Requirements: anaconda, cudatoolkit, pytorch
Hardware requirements:
-
LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or older).
-
8-bit optimizers and quantization: NVIDIA
Maxwell
GPU or newer (>=GTX
9X
X).
-
8-bit optimizers and quantization: NVIDIA
Kepler
GPU or newer (>=GTX
78
X).
Supported CUDA versions: 10.2 - 1
1.7
Supported CUDA versions: 10.2 - 1
2.0
The bitsandbytes library is currently only supported on Linux distributions. Windows is not supported at the moment.
...
...
cuda_install.sh
View file @
3901ebf7
...
...
@@ -11,6 +11,7 @@ URL115=https://developer.download.nvidia.com/compute/cuda/11.5.2/local_installer
URL116
=
https://developer.download.nvidia.com/compute/cuda/11.6.2/local_installers/cuda_11.6.2_510.47.03_linux.run
URL117
=
https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
URL118
=
https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
URL120
=
https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run
CUDA_VERSION
=
$1
...
...
@@ -56,6 +57,9 @@ if [[ -n "$CUDA_VERSION" ]]; then
elif
[[
"
$CUDA_VERSION
"
-eq
"118"
]]
;
then
URL
=
$URL118
FOLDER
=
cuda-11.8
elif
[[
"
$CUDA_VERSION
"
-eq
"120"
]]
;
then
URL
=
$URL120
FOLDER
=
cuda-12.0
else
echo
"argument error: No cuda version passed as input. Choose among: {111, 115}"
fi
...
...
deploy.sh
View file @
3901ebf7
...
...
@@ -110,7 +110,7 @@ fi
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-11.8
make cuda1
1
x
CUDA_VERSION
=
118
make cuda1
2
x
CUDA_VERSION
=
118
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda118.so"
]
;
then
# Control will enter here if $DIRECTORY doesn't exist.
...
...
@@ -118,6 +118,16 @@ if [ ! -f "./bitsandbytes/libbitsandbytes_cuda118.so" ]; then
exit
64
fi
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-12.0
make cuda12x
CUDA_VERSION
=
120
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda120.so"
]
;
then
# Control will enter here if $DIRECTORY doesn't exist.
echo
"Compilation unsuccessul!"
1>&2
exit
64
fi
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-10.2
...
...
@@ -213,7 +223,7 @@ fi
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-11.8
make cuda1
1
x_nomatmul
CUDA_VERSION
=
118
make cuda1
2
x_nomatmul
CUDA_VERSION
=
118
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so"
]
;
then
# Control will enter here if $DIRECTORY doesn't exist.
...
...
@@ -221,5 +231,15 @@ if [ ! -f "./bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so" ]; then
exit
64
fi
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-12.0
make cuda12x_nomatmul
CUDA_VERSION
=
120
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda120_nocublaslt.so"
]
;
then
# Control will enter here if $DIRECTORY doesn't exist.
echo
"Compilation unsuccessul!"
1>&2
exit
64
fi
python
-m
build
python
-m
twine upload dist/
*
--verbose
setup.py
View file @
3901ebf7
...
...
@@ -18,7 +18,7 @@ def read(fname):
setup
(
name
=
f
"bitsandbytes"
,
version
=
f
"0.3
5.4
"
,
version
=
f
"0.3
6.0
"
,
author
=
"Tim Dettmers"
,
author_email
=
"dettmers@cs.washington.edu"
,
description
=
"8-bit optimizers and matrix multiplication routines."
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment