Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
bitsandbytes
Commits
3901ebf7
Commit
3901ebf7
authored
Jan 04, 2023
by
Tim Dettmers
Browse files
Added CUDA 12.0 support; removed CC 3.0 support.
parent
b3de1921
Changes
6
Show whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
92 additions
and
22 deletions
+92
-22
CHANGELOG.md
CHANGELOG.md
+35
-1
Makefile
Makefile
+28
-16
README.md
README.md
+2
-2
cuda_install.sh
cuda_install.sh
+4
-0
deploy.sh
deploy.sh
+22
-2
setup.py
setup.py
+1
-1
No files found.
CHANGELOG.md
View file @
3901ebf7
...
@@ -139,7 +139,6 @@ Features:
...
@@ -139,7 +139,6 @@ Features:
Bug fixes:
Bug fixes:
-
Fixed a problem where warning messages would be displayed even though everything worked correctly.
-
Fixed a problem where warning messages would be displayed even though everything worked correctly.
### 0.35.2
### 0.35.2
Bug fixes:
Bug fixes:
...
@@ -155,3 +154,38 @@ Bug fixes:
...
@@ -155,3 +154,38 @@ Bug fixes:
Bug fixes:
Bug fixes:
-
Fixed a bug in the CUDA Setup failed with the cuda runtime was found, but not the cuda library.
-
Fixed a bug in the CUDA Setup failed with the cuda runtime was found, but not the cuda library.
-
Fixed a bug where not finding the cuda runtime led to an incomprehensible error.
-
Fixed a bug where not finding the cuda runtime led to an incomprehensible error.
### 0.36.0
#### Improvements, Ada/Hopper support, fake k-bit quantization.
Features:
-
CUDA 11.8 and 12.0 support added
-
support for Ada and Hopper GPUs added (compute capability 8.9 and 9.0)
-
support for fake k-bit block-wise quantization for Int, Float, quantile quantization, and dynamic exponent data types added
-
Added CUDA instruction generator to fix some installations.
-
Added additional block sizes for quantization {64, 128, 256, 512, 1024}
-
Added SRAM Quantile algorithm to quickly estimate less than 256 quantiles
-
Added option to suppress the bitsandbytes welcome message (@Cyberes)
Regression:
-
Compute capability 3.0 removed: GTX 600s and 700s series is no longer supported (except GTX 780 and GTX 780 Ti)
Bug fixes:
-
fixed a bug where too long directory names would crash the CUDA SETUP #35 (@tomaarsen)
-
fixed a bug where CPU installations on Colab would run into an error #34 (@tomaarsen)
-
fixed an issue where the default CUDA version with fast-DreamBooth was not supported #52
-
fixed a bug where the CUDA setup failed due to a wrong function call.
-
fixed a bug in the CUDA Setup which led to an incomprehensible error if no GPU was detected.
-
fixed a bug in the CUDA Setup failed with the cuda runtime was found, but not the cuda library.
-
fixed a bug where not finding the cuda runtime led to an incomprehensible error.
-
fixed a bug where with missing CUDA the default was an error instead of the loading the CPU library
-
fixed a bug where the CC version of the GPU was not detected appropriately (@BlackHC)
-
fixed a bug in CPU quantization which lead to errors when the input buffer exceeded 2^31 elements
Improvements:
-
multiple improvements in formatting, removal of unused imports, and slight performance improvements (@tomaarsen)
-
StableEmbedding layer now has device and dtype parameters to make it 1:1 replaceable with regular Embedding layers (@lostmsu)
-
runtime performance of block-wise quantization slightly improved
-
added error message for the case multiple libcudart.so are installed and bitsandbytes picks the wrong one
Makefile
View file @
3901ebf7
...
@@ -22,12 +22,11 @@ BUILD_DIR:= $(ROOT_DIR)/build
...
@@ -22,12 +22,11 @@ BUILD_DIR:= $(ROOT_DIR)/build
FILES_CUDA
:=
$(CSRC)
/ops.cu
$(CSRC)
/kernels.cu
FILES_CUDA
:=
$(CSRC)
/ops.cu
$(CSRC)
/kernels.cu
FILES_CPP
:=
$(CSRC)
/common.cpp
$(CSRC)
/cpu_ops.cpp
$(CSRC)
/pythonInterface.c
FILES_CPP
:=
$(CSRC)
/common.cpp
$(CSRC)
/cpu_ops.cpp
$(CSRC)
/pythonInterface.c
INCLUDE
:=
-I
$(CUDA_HOME)
/include
-I
$(ROOT_DIR)
/csrc
-I
$(CONDA_PREFIX)
/include
-I
$(ROOT_DIR)
/dependencies/cub
-I
$(ROOT_DIR)
/include
INCLUDE
:=
-I
$(CUDA_HOME)
/include
-I
$(ROOT_DIR)
/csrc
-I
$(CONDA_PREFIX)
/include
-I
$(ROOT_DIR)
/include
INCLUDE_10x
:=
-I
$(CUDA_HOME)
/include
-I
$(ROOT_DIR)
/csrc
-I
$(ROOT_DIR)
/dependencies/cub
-I
$(ROOT_DIR)
/include
LIB
:=
-L
$(CUDA_HOME)
/lib64
-lcudart
-lcublas
-lcublasLt
-lcurand
-lcusparse
-L
$(CONDA_PREFIX)
/lib
LIB
:=
-L
$(CUDA_HOME)
/lib64
-lcudart
-lcublas
-lcublasLt
-lcurand
-lcusparse
-L
$(CONDA_PREFIX)
/lib
# NVIDIA NVCC compilation flags
# NVIDIA NVCC compilation flags
COMPUTE_CAPABILITY
:=
-gencode
arch
=
compute_35,code
=
sm_35
# Kepler
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_37,code
=
sm_37
# Kepler
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_50,code
=
sm_50
# Maxwell
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_50,code
=
sm_50
# Maxwell
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_52,code
=
sm_52
# Maxwell
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_52,code
=
sm_52
# Maxwell
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_60,code
=
sm_60
# Pascal
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_60,code
=
sm_60
# Pascal
...
@@ -35,11 +34,10 @@ COMPUTE_CAPABILITY += -gencode arch=compute_61,code=sm_61 # Pascal
...
@@ -35,11 +34,10 @@ COMPUTE_CAPABILITY += -gencode arch=compute_61,code=sm_61 # Pascal
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_70,code
=
sm_70
# Volta
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_70,code
=
sm_70
# Volta
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_72,code
=
sm_72
# Volta
COMPUTE_CAPABILITY
+=
-gencode
arch
=
compute_72,code
=
sm_72
# Volta
# CUDA 9.2 supports CC 3.0, but CUDA >= 11.0 does not
CC_KEPLER
:=
-gencode
arch
=
compute_35,code
=
sm_35
# Kepler
CC_
CUDA92
:
=
-gencode
arch
=
compute_3
0
,code
=
sm_3
0
CC_
KEPLER
+
=
-gencode
arch
=
compute_3
7
,code
=
sm_3
7
# Kepler
# Later versions of CUDA support the new architectures
# Later versions of CUDA support the new architectures
CC_CUDA10x
:=
-gencode
arch
=
compute_30,code
=
sm_30
CC_CUDA10x
+=
-gencode
arch
=
compute_75,code
=
sm_75
CC_CUDA10x
+=
-gencode
arch
=
compute_75,code
=
sm_75
CC_CUDA110
:=
-gencode
arch
=
compute_75,code
=
sm_75
CC_CUDA110
:=
-gencode
arch
=
compute_75,code
=
sm_75
...
@@ -49,6 +47,7 @@ CC_CUDA11x := -gencode arch=compute_75,code=sm_75
...
@@ -49,6 +47,7 @@ CC_CUDA11x := -gencode arch=compute_75,code=sm_75
CC_CUDA11x
+=
-gencode
arch
=
compute_80,code
=
sm_80
CC_CUDA11x
+=
-gencode
arch
=
compute_80,code
=
sm_80
CC_CUDA11x
+=
-gencode
arch
=
compute_86,code
=
sm_86
CC_CUDA11x
+=
-gencode
arch
=
compute_86,code
=
sm_86
CC_cublasLt110
:=
-gencode
arch
=
compute_75,code
=
sm_75
CC_cublasLt110
:=
-gencode
arch
=
compute_75,code
=
sm_75
CC_cublasLt110
+=
-gencode
arch
=
compute_80,code
=
sm_80
CC_cublasLt110
+=
-gencode
arch
=
compute_80,code
=
sm_80
...
@@ -56,30 +55,38 @@ CC_cublasLt111 := -gencode arch=compute_75,code=sm_75
...
@@ -56,30 +55,38 @@ CC_cublasLt111 := -gencode arch=compute_75,code=sm_75
CC_cublasLt111
+=
-gencode
arch
=
compute_80,code
=
sm_80
CC_cublasLt111
+=
-gencode
arch
=
compute_80,code
=
sm_80
CC_cublasLt111
+=
-gencode
arch
=
compute_86,code
=
sm_86
CC_cublasLt111
+=
-gencode
arch
=
compute_86,code
=
sm_86
CC_ADA_HOPPER
:=
-gencode
arch
=
compute_89,code
=
sm_89
CC_ADA_HOPPER
+=
-gencode
arch
=
compute_90,code
=
sm_90
all
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
all
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
$(NVCC)
$(COMPUTE_CAPABILITY)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
.so
$(LIB)
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
.so
$(LIB)
cuda92
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
cuda92
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA92)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA92)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA92)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA92)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda10x_nomatmul
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
cuda10x_nomatmul
:
$(ROOT_DIR)/dependencies/cub $(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA10x)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA10x)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE
_10x
)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA10x)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA10x)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda110_nomatmul
:
$(BUILD_DIR) env
cuda110_nomatmul
:
$(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA110)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA110)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA110)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA110)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda11x_nomatmul
:
$(BUILD_DIR) env
cuda11x_nomatmul
:
$(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
$(CC_KEPLER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda12x_nomatmul
:
$(BUILD_DIR) env
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
$(CC_ADA_HOPPER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
-D
NO_CUBLASLT
$(NVCC)
$(COMPUTE_CAPABILITY)
$(CC_CUDA11x)
$(CC_ADA_HOPPER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
_nocublaslt.so
$(LIB)
cuda110
:
$(BUILD_DIR) env
cuda110
:
$(BUILD_DIR) env
...
@@ -92,6 +99,11 @@ cuda11x: $(BUILD_DIR) env
...
@@ -92,6 +99,11 @@ cuda11x: $(BUILD_DIR) env
$(NVCC)
$(CC_cublasLt111)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(NVCC)
$(CC_cublasLt111)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
.so
$(LIB)
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
.so
$(LIB)
cuda12x
:
$(BUILD_DIR) env
$(NVCC)
$(CC_cublasLt111)
$(CC_ADA_HOPPER)
-Xcompiler
'-fPIC'
--use_fast_math
-Xptxas
=
-v
-dc
$(FILES_CUDA)
$(INCLUDE)
$(LIB)
--output-directory
$(BUILD_DIR)
$(NVCC)
$(CC_cublasLt111)
$(CC_ADA_HOPPER)
-Xcompiler
'-fPIC'
-dlink
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
-o
$(BUILD_DIR)
/link.o
$(GPP)
-std
=
c++14
-DBUILD_CUDA
-shared
-fPIC
$(INCLUDE)
$(BUILD_DIR)
/ops.o
$(BUILD_DIR)
/kernels.o
$(BUILD_DIR)
/link.o
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cuda
$(CUDA_VERSION)
.so
$(LIB)
cpuonly
:
$(BUILD_DIR) env
cpuonly
:
$(BUILD_DIR) env
$(GPP)
-std
=
c++14
-shared
-fPIC
-I
$(ROOT_DIR)
/csrc
-I
$(ROOT_DIR)
/include
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cpu.so
$(GPP)
-std
=
c++14
-shared
-fPIC
-I
$(ROOT_DIR)
/csrc
-I
$(ROOT_DIR)
/include
$(FILES_CPP)
-o
./bitsandbytes/libbitsandbytes_cpu.so
...
...
README.md
View file @
3901ebf7
...
@@ -50,9 +50,9 @@ Requirements: anaconda, cudatoolkit, pytorch
...
@@ -50,9 +50,9 @@ Requirements: anaconda, cudatoolkit, pytorch
Hardware requirements:
Hardware requirements:
-
LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or older).
-
LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or older).
-
8-bit optimizers and quantization: NVIDIA
Maxwell
GPU or newer (>=GTX
9X
X).
-
8-bit optimizers and quantization: NVIDIA
Kepler
GPU or newer (>=GTX
78
X).
Supported CUDA versions: 10.2 - 1
1.7
Supported CUDA versions: 10.2 - 1
2.0
The bitsandbytes library is currently only supported on Linux distributions. Windows is not supported at the moment.
The bitsandbytes library is currently only supported on Linux distributions. Windows is not supported at the moment.
...
...
cuda_install.sh
View file @
3901ebf7
...
@@ -11,6 +11,7 @@ URL115=https://developer.download.nvidia.com/compute/cuda/11.5.2/local_installer
...
@@ -11,6 +11,7 @@ URL115=https://developer.download.nvidia.com/compute/cuda/11.5.2/local_installer
URL116
=
https://developer.download.nvidia.com/compute/cuda/11.6.2/local_installers/cuda_11.6.2_510.47.03_linux.run
URL116
=
https://developer.download.nvidia.com/compute/cuda/11.6.2/local_installers/cuda_11.6.2_510.47.03_linux.run
URL117
=
https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
URL117
=
https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
URL118
=
https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
URL118
=
https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
URL120
=
https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run
CUDA_VERSION
=
$1
CUDA_VERSION
=
$1
...
@@ -56,6 +57,9 @@ if [[ -n "$CUDA_VERSION" ]]; then
...
@@ -56,6 +57,9 @@ if [[ -n "$CUDA_VERSION" ]]; then
elif
[[
"
$CUDA_VERSION
"
-eq
"118"
]]
;
then
elif
[[
"
$CUDA_VERSION
"
-eq
"118"
]]
;
then
URL
=
$URL118
URL
=
$URL118
FOLDER
=
cuda-11.8
FOLDER
=
cuda-11.8
elif
[[
"
$CUDA_VERSION
"
-eq
"120"
]]
;
then
URL
=
$URL120
FOLDER
=
cuda-12.0
else
else
echo
"argument error: No cuda version passed as input. Choose among: {111, 115}"
echo
"argument error: No cuda version passed as input. Choose among: {111, 115}"
fi
fi
...
...
deploy.sh
View file @
3901ebf7
...
@@ -110,7 +110,7 @@ fi
...
@@ -110,7 +110,7 @@ fi
make clean
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-11.8
export
CUDA_HOME
=
$BASE_PATH
/cuda-11.8
make cuda1
1
x
CUDA_VERSION
=
118
make cuda1
2
x
CUDA_VERSION
=
118
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda118.so"
]
;
then
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda118.so"
]
;
then
# Control will enter here if $DIRECTORY doesn't exist.
# Control will enter here if $DIRECTORY doesn't exist.
...
@@ -118,6 +118,16 @@ if [ ! -f "./bitsandbytes/libbitsandbytes_cuda118.so" ]; then
...
@@ -118,6 +118,16 @@ if [ ! -f "./bitsandbytes/libbitsandbytes_cuda118.so" ]; then
exit
64
exit
64
fi
fi
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-12.0
make cuda12x
CUDA_VERSION
=
120
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda120.so"
]
;
then
# Control will enter here if $DIRECTORY doesn't exist.
echo
"Compilation unsuccessul!"
1>&2
exit
64
fi
make clean
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-10.2
export
CUDA_HOME
=
$BASE_PATH
/cuda-10.2
...
@@ -213,7 +223,7 @@ fi
...
@@ -213,7 +223,7 @@ fi
make clean
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-11.8
export
CUDA_HOME
=
$BASE_PATH
/cuda-11.8
make cuda1
1
x_nomatmul
CUDA_VERSION
=
118
make cuda1
2
x_nomatmul
CUDA_VERSION
=
118
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so"
]
;
then
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so"
]
;
then
# Control will enter here if $DIRECTORY doesn't exist.
# Control will enter here if $DIRECTORY doesn't exist.
...
@@ -221,5 +231,15 @@ if [ ! -f "./bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so" ]; then
...
@@ -221,5 +231,15 @@ if [ ! -f "./bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so" ]; then
exit
64
exit
64
fi
fi
make clean
export
CUDA_HOME
=
$BASE_PATH
/cuda-12.0
make cuda12x_nomatmul
CUDA_VERSION
=
120
if
[
!
-f
"./bitsandbytes/libbitsandbytes_cuda120_nocublaslt.so"
]
;
then
# Control will enter here if $DIRECTORY doesn't exist.
echo
"Compilation unsuccessul!"
1>&2
exit
64
fi
python
-m
build
python
-m
build
python
-m
twine upload dist/
*
--verbose
python
-m
twine upload dist/
*
--verbose
setup.py
View file @
3901ebf7
...
@@ -18,7 +18,7 @@ def read(fname):
...
@@ -18,7 +18,7 @@ def read(fname):
setup
(
setup
(
name
=
f
"bitsandbytes"
,
name
=
f
"bitsandbytes"
,
version
=
f
"0.3
5.4
"
,
version
=
f
"0.3
6.0
"
,
author
=
"Tim Dettmers"
,
author
=
"Tim Dettmers"
,
author_email
=
"dettmers@cs.washington.edu"
,
author_email
=
"dettmers@cs.washington.edu"
,
description
=
"8-bit optimizers and matrix multiplication routines."
,
description
=
"8-bit optimizers and matrix multiplication routines."
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment