Unverified Commit 41faf4e3 authored by Tim Dettmers's avatar Tim Dettmers Committed by GitHub
Browse files

Merge branch 'main' into main

parents fea5bc7b 095f7a56
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve bitsandbytes
body:
- type: textarea
id: system-info
attributes:
label: System Info
description: Please share your relevant system information with us
placeholder: platform, python version, hardware, ...
validations:
required: true
- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
Please provide the simplest reproducer as possible so that we can quickly fix the issue.
placeholder: |
Reproducer:
- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."
\ No newline at end of file
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new feature
labels: [ "feature" ]
body:
- type: textarea
id: feature-request
validations:
required: true
attributes:
label: Feature request
description: |
A clear and concise description of the feature proposal.
- type: textarea
id: motivation
validations:
required: true
attributes:
label: Motivation
description: |
Please outline the motivation for the proposal. Is your feature request related to a problem?
- type: textarea
id: contribution
validations:
required: true
attributes:
label: Your contribution
description: |
Is there any way that you could help, e.g. by submitting a PR?
\ No newline at end of file
name: Stale Bot
on:
schedule:
- cron: "0 15 * * *"
jobs:
close_stale_issues:
name: Close Stale Issues
if: github.repository == 'TimDettmers/bitsandbytes'
runs-on: ubuntu-latest
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install requirements
run: |
pip install PyGithub
- name: Close stale issues
run: |
python scripts/stale.py
\ No newline at end of file
...@@ -133,3 +133,4 @@ dmypy.json ...@@ -133,3 +133,4 @@ dmypy.json
dependencies dependencies
cuda_build cuda_build
.vscode/*
[style]
ALIGN_CLOSING_BRACKET_WITH_VISUAL_INDENT = True
ALLOW_MULTILINE_LAMBDAS = True
BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF = True
COLUMN_LIMIT = 88
COALESCE_BRACKETS = True
SPACE_BETWEEN_ENDING_COMMA_AND_CLOSING_BRACKET = True
SPACES_BEFORE_COMMENT = 2
SPLIT_BEFORE_BITWISE_OPERATOR = True
SPLIT_BEFORE_FIRST_ARGUMENT = True
SPLIT_BEFORE_LOGICAL_OPERATOR = True
SPLIT_BEFORE_NAMED_ASSIGNS = True
SPLIT_COMPLEX_COMPREHENSION = True
\ No newline at end of file
...@@ -311,7 +311,19 @@ User experience: ...@@ -311,7 +311,19 @@ User experience:
Performance: Performance:
- improved 4-bit inference performance for A100 GPUs. This degraded performance for A40/RTX3090 and RTX 4090 GPUs slightly. - improved 4-bit inference performance for A100 GPUs. This degraded performance for A40/RTX3090 and RTX 4090 GPUs slightly.
### 0.41.0 ### 0.41.1
Bug fixes: Bug fixes:
- Fixed bugs in dynamic exponent data type creation. Thank you @RossM, @KohakuBlueleaf, @ArrowM #659 #227 #262 #152 - Fixed bugs in dynamic exponent data type creation. Thank you @RossM, @KohakuBlueleaf, @ArrowM #659 #227 #262 #152
### 0.41.2
Feature:
- 4-bit serialization now supported. This enables 4-bit load/store. Thank you @poedator #753
### 0.41.3
Bug fixes:
- Fixed an issue where 4-bit serialization would fail for layers without double quantization #868. Thank you, @poedator
- Fixed an issue where calling .to() or .cuda() on a 4-bit layer twice would result in an error #867. Thank you, @jph00
...@@ -38,7 +38,7 @@ python setup.py install ...@@ -38,7 +38,7 @@ python setup.py install
```python ```python
from transformers import AutoModelForCausalLM from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained( model = AutoModelForCausalLM.from_pretrained(
'decapoda-research/llama-7b-hf, 'decapoda-research/llama-7b-hf',
device_map='auto', device_map='auto',
load_in_8bit=True, load_in_8bit=True,
max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB') max_memory=f'{int(torch.cuda.mem_get_info()[0]/1024**3)-2}GB')
...@@ -146,13 +146,13 @@ For upcoming features and changes and full history see [Patch Notes](CHANGELOG.m ...@@ -146,13 +146,13 @@ For upcoming features and changes and full history see [Patch Notes](CHANGELOG.m
To compile from source, you need an installation of CUDA. If `nvcc` is not installed, you can install the CUDA Toolkit with nvcc through the following commands. To compile from source, you need an installation of CUDA. If `nvcc` is not installed, you can install the CUDA Toolkit with nvcc through the following commands.
```bash ```bash
wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH # Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
# CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121} # CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121, 122}
# EXPORT_TO_BASH in {0, 1} with 0=False and 1=True # EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
# For example, the following installs CUDA 11.8 to ~/local/cuda-11.8 and exports the path to your .bashrc # For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc
bash cuda install 118 ~/local 1 bash install_cuda.sh 117 ~/local 1
``` ```
To use a specific CUDA version just for a single compile run, you can set the variable `CUDA_HOME`, for example the following command compiles `libbitsandbytes_cuda117.so` using compiler flags for cuda11x with the cuda version at `~/local/cuda-11.7`: To use a specific CUDA version just for a single compile run, you can set the variable `CUDA_HOME`, for example the following command compiles `libbitsandbytes_cuda117.so` using compiler flags for cuda11x with the cuda version at `~/local/cuda-11.7`:
......
...@@ -496,7 +496,7 @@ class MatMul4Bit(torch.autograd.Function): ...@@ -496,7 +496,7 @@ class MatMul4Bit(torch.autograd.Function):
# backward is mostly the same, but adds one extra clause (see "elif state.CxB is not None") # backward is mostly the same, but adds one extra clause (see "elif state.CxB is not None")
@staticmethod @staticmethod
def forward(ctx, A, B, out=None, bias=None, state=None): def forward(ctx, A, B, out=None, bias=None, quant_state: F.QuantState = None):
# default of pytorch behavior if inputs are empty # default of pytorch behavior if inputs are empty
ctx.is_empty = False ctx.is_empty = False
if prod(A.shape) == 0: if prod(A.shape) == 0:
...@@ -504,7 +504,7 @@ class MatMul4Bit(torch.autograd.Function): ...@@ -504,7 +504,7 @@ class MatMul4Bit(torch.autograd.Function):
ctx.A = A ctx.A = A
ctx.B = B ctx.B = B
ctx.bias = bias ctx.bias = bias
B_shape = state[1] B_shape = quant_state.shape
if A.shape[-1] == B_shape[0]: if A.shape[-1] == B_shape[0]:
return torch.empty(A.shape[:-1] + B_shape[1:], dtype=A.dtype, device=A.device) return torch.empty(A.shape[:-1] + B_shape[1:], dtype=A.dtype, device=A.device)
else: else:
...@@ -513,10 +513,10 @@ class MatMul4Bit(torch.autograd.Function): ...@@ -513,10 +513,10 @@ class MatMul4Bit(torch.autograd.Function):
# 1. Dequantize # 1. Dequantize
# 2. MatmulnN # 2. MatmulnN
output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias) output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
# 3. Save state # 3. Save state
ctx.state = state ctx.state = quant_state
ctx.dtype_A, ctx.dtype_B, ctx.dtype_bias = A.dtype, B.dtype, None if bias is None else bias.dtype ctx.dtype_A, ctx.dtype_B, ctx.dtype_bias = A.dtype, B.dtype, None if bias is None else bias.dtype
if any(ctx.needs_input_grad[:2]): if any(ctx.needs_input_grad[:2]):
...@@ -534,7 +534,6 @@ class MatMul4Bit(torch.autograd.Function): ...@@ -534,7 +534,6 @@ class MatMul4Bit(torch.autograd.Function):
req_gradA, _, _, req_gradBias, _= ctx.needs_input_grad req_gradA, _, _, req_gradBias, _= ctx.needs_input_grad
A, B = ctx.tensors A, B = ctx.tensors
state = ctx.state
grad_A, grad_B, grad_bias = None, None, None grad_A, grad_B, grad_bias = None, None, None
...@@ -563,12 +562,11 @@ def matmul( ...@@ -563,12 +562,11 @@ def matmul(
return MatMul8bitLt.apply(A, B, out, bias, state) return MatMul8bitLt.apply(A, B, out, bias, state)
def matmul_4bit(A: tensor, B: tensor, quant_state: List, out: tensor = None, bias=None): def matmul_4bit(A: tensor, B: tensor, quant_state: F.QuantState, out: tensor = None, bias=None):
assert quant_state is not None assert quant_state is not None
if A.numel() == A.shape[-1] and A.requires_grad == False: if A.numel() == A.shape[-1] and A.requires_grad == False:
absmax, shape, dtype, blocksize, compressed_stats, quant_type, data_type = quant_state if A.shape[-1] % quant_state.blocksize != 0:
if A.shape[-1] % blocksize != 0: warn(f'Some matrices hidden dimension is not a multiple of {quant_state.blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}')
warn(f'Some matrices hidden dimension is not a multiple of {blocksize} and efficient inference kernels are not supported for these (slow). Matrix input size found: {A.shape}')
return MatMul4Bit.apply(A, B, out, bias, quant_state) return MatMul4Bit.apply(A, B, out, bias, quant_state)
else: else:
out = F.gemv_4bit(A, B.t(), out, state=quant_state) out = F.gemv_4bit(A, B.t(), out, state=quant_state)
......
...@@ -8,6 +8,7 @@ def to_be_ignored(env_var: str, value: str) -> bool: ...@@ -8,6 +8,7 @@ def to_be_ignored(env_var: str, value: str) -> bool:
"OLDPWD", "OLDPWD",
"SSH_AUTH_SOCK", # SSH stuff, therefore unrelated "SSH_AUTH_SOCK", # SSH stuff, therefore unrelated
"SSH_TTY", "SSH_TTY",
"GOOGLE_VM_CONFIG_LOCK_FILE", # on GCP setups, requires elevated permissions, causing problems in Jupyter notebooks
"HOME", # Linux shell default "HOME", # Linux shell default
"TMUX", # Terminal Multiplexer "TMUX", # Terminal Multiplexer
"XDG_DATA_DIRS", # XDG: Desktop environment stuff "XDG_DATA_DIRS", # XDG: Desktop environment stuff
......
...@@ -67,6 +67,7 @@ class CUDASetup: ...@@ -67,6 +67,7 @@ class CUDASetup:
self.add_log_entry('CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh') self.add_log_entry('CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh')
self.add_log_entry('CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.') self.add_log_entry('CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.')
self.add_log_entry('CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local') self.add_log_entry('CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local')
return return
make_cmd = f'CUDA_VERSION={self.cuda_version_string}' make_cmd = f'CUDA_VERSION={self.cuda_version_string}'
......
This diff is collapsed.
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
# #
# This source code is licensed under the MIT license found in the # This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree. # LICENSE file in the root directory of this source tree.
from typing import Optional, TypeVar, Union, overload from typing import Any, Dict, Optional, TypeVar, Union, overload
import warnings import warnings
import torch import torch
...@@ -10,7 +10,7 @@ import torch.nn.functional as F ...@@ -10,7 +10,7 @@ import torch.nn.functional as F
from torch import Tensor, device, dtype, nn from torch import Tensor, device, dtype, nn
import bitsandbytes as bnb import bitsandbytes as bnb
import bitsandbytes.functional from bitsandbytes.functional import QuantState
from bitsandbytes.autograd._functions import undo_layout, get_tile_inds from bitsandbytes.autograd._functions import undo_layout, get_tile_inds
from bitsandbytes.optim import GlobalOptimManager from bitsandbytes.optim import GlobalOptimManager
from bitsandbytes.utils import OutlierTracer, find_outlier_dims from bitsandbytes.utils import OutlierTracer, find_outlier_dims
...@@ -139,8 +139,10 @@ class Embedding(torch.nn.Embedding): ...@@ -139,8 +139,10 @@ class Embedding(torch.nn.Embedding):
return emb return emb
class Params4bit(torch.nn.Parameter): class Params4bit(torch.nn.Parameter):
def __new__(cls, data=None, requires_grad=True, quant_state=None, blocksize=64, compress_statistics=True, quant_type='fp4'):
def __new__(cls, data: Optional[torch.Tensor] = None, requires_grad=True, quant_state: QuantState = None, blocksize: int = 64, compress_statistics: bool = True, quant_type: str = 'fp4') -> "Params4bit":
if data is None: if data is None:
data = torch.empty(0) data = torch.empty(0)
...@@ -152,6 +154,16 @@ class Params4bit(torch.nn.Parameter): ...@@ -152,6 +154,16 @@ class Params4bit(torch.nn.Parameter):
self.data = data self.data = data
return self return self
@classmethod
def from_prequantized(cls, data: torch.Tensor, quantized_stats: Dict[str, Any], requires_grad: bool = False, device='cuda', **kwargs) -> "Params4bit":
self = torch.Tensor._make_subclass(cls, data.to(device))
self.requires_grad = requires_grad
self.quant_state = QuantState.from_dict(qs_dict=quantized_stats, device=device)
self.blocksize = self.quant_state.blocksize
self.compress_statistics = self.quant_state.nested
self.quant_type = self.quant_state.quant_type
return self
def cuda(self, device): def cuda(self, device):
w = self.data.contiguous().half().cuda(device) w = self.data.contiguous().half().cuda(device)
w_4bit, quant_state = bnb.functional.quantize_4bit(w, blocksize=self.blocksize, compress_statistics=self.compress_statistics, quant_type=self.quant_type) w_4bit, quant_state = bnb.functional.quantize_4bit(w, blocksize=self.blocksize, compress_statistics=self.compress_statistics, quant_type=self.quant_type)
...@@ -178,33 +190,23 @@ class Params4bit(torch.nn.Parameter): ...@@ -178,33 +190,23 @@ class Params4bit(torch.nn.Parameter):
if (device is not None and device.type == "cuda" and self.data.device.type == "cpu"): if (device is not None and device.type == "cuda" and self.data.device.type == "cpu"):
return self.cuda(device) return self.cuda(device)
else: else:
s = self.quant_state if self.quant_state is not None:
if s is not None: self.quant_state.to(device)
# make sure the quantization state is on the right device
s[0] = s[0].to(device)
if self.compress_statistics:
# TODO: refactor this. This is a nightmare
# for 4-bit:
# state = [qabsmax, input_shape, A.dtype, blocksize, [offset, state2], quant_type]
# state2 = [absmax, input_shape, A.dtype, blocksize, None, quant_type]
#s[-2][0] = s[-2][0].to(device) # offset
#s[-2][1][0] = s[-2][1][0].to(device) # nested absmax
# for 8-bit
s[-3][0] = s[-3][0].to(device) # offset
s[-3][1][0] = s[-3][1][0].to(device) # nested quantiation state statitics
s[-3][1][1] = s[-3][1][1].to(device) # nested quantiation codebook
new_param = Params4bit(super().to(device=device, dtype=dtype, non_blocking=non_blocking), new_param = Params4bit(super().to(device=device, dtype=dtype, non_blocking=non_blocking),
requires_grad=self.requires_grad, quant_state=self.quant_state, requires_grad=self.requires_grad, quant_state=self.quant_state,
blocksize=self.blocksize, compress_statistics=self.compress_statistics, blocksize=self.blocksize, compress_statistics=self.compress_statistics,
quant_type=self.quant_type) quant_type=self.quant_type)
return new_param return new_param
class Linear4bit(nn.Linear): class Linear4bit(nn.Linear):
def __init__(self, input_features, output_features, bias=True, compute_dtype=None, compress_statistics=True, quant_type='fp4',device=None):
def __init__(self, input_features, output_features, bias=True, compute_dtype=None, compress_statistics=True, quant_type='fp4', device=None):
super().__init__(input_features, output_features, bias, device) super().__init__(input_features, output_features, bias, device)
self.weight = Params4bit(self.weight.data, requires_grad=False, compress_statistics=compress_statistics, quant_type=quant_type) self.weight = Params4bit(self.weight.data, requires_grad=False, compress_statistics=compress_statistics, quant_type=quant_type)
# self.persistent_buffers = [] # TODO consider as way to save quant state
self.compute_dtype = compute_dtype self.compute_dtype = compute_dtype
self.compute_type_is_set = False self.compute_type_is_set = False
...@@ -224,10 +226,16 @@ class Linear4bit(nn.Linear): ...@@ -224,10 +226,16 @@ class Linear4bit(nn.Linear):
warnings.warn(f'Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.') warnings.warn(f'Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.')
warnings.filterwarnings('ignore', message='.*inference or training') warnings.filterwarnings('ignore', message='.*inference or training')
def _save_to_state_dict(self, destination, prefix, keep_vars):
"""
save weight and bias,
then fill state_dict with components of quant_state
"""
super()._save_to_state_dict(destination, prefix, keep_vars) # saving weight and bias
if getattr(self.weight, "quant_state", None) is not None:
for k, v in self.weight.quant_state.as_dict(packed=True).items():
destination[prefix + "weight." + k] = v if keep_vars else v.detach()
def forward(self, x: torch.Tensor): def forward(self, x: torch.Tensor):
# weights are cast automatically as Int8Params, but the bias has to be cast manually # weights are cast automatically as Int8Params, but the bias has to be cast manually
...@@ -251,10 +259,12 @@ class Linear4bit(nn.Linear): ...@@ -251,10 +259,12 @@ class Linear4bit(nn.Linear):
return out return out
class LinearFP4(Linear4bit): class LinearFP4(Linear4bit):
def __init__(self, input_features, output_features, bias=True, compute_dtype=None, compress_statistics=True,device=None): def __init__(self, input_features, output_features, bias=True, compute_dtype=None, compress_statistics=True, device=None):
super().__init__(input_features, output_features, bias, compute_dtype, compress_statistics, 'fp4', device) super().__init__(input_features, output_features, bias, compute_dtype, compress_statistics, 'fp4', device)
class LinearNF4(Linear4bit): class LinearNF4(Linear4bit):
''' Implements the NF4 data type. ''' Implements the NF4 data type.
...@@ -266,11 +276,10 @@ class LinearNF4(Linear4bit): ...@@ -266,11 +276,10 @@ class LinearNF4(Linear4bit):
Implementation of the NF4 data type in bitsandbytes can be found in the `create_normal_map` function in Implementation of the NF4 data type in bitsandbytes can be found in the `create_normal_map` function in
the `functional.py` file: https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L236. the `functional.py` file: https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L236.
''' '''
def __init__(self, input_features, output_features, bias=True, compute_dtype=None, compress_statistics=True,device=None): def __init__(self, input_features, output_features, bias=True, compute_dtype=None, compress_statistics=True, device=None):
super().__init__(input_features, output_features, bias, compute_dtype, compress_statistics, 'nf4', device) super().__init__(input_features, output_features, bias, compute_dtype, compress_statistics, 'nf4', device)
class Int8Params(torch.nn.Parameter): class Int8Params(torch.nn.Parameter):
def __new__( def __new__(
cls, cls,
......
import json
import shlex import shlex
import subprocess import subprocess
import torch import torch
...@@ -99,45 +100,6 @@ def find_outlier_dims(weight, reduction_dim=0, zscore=4.0, topk=None, rdm=False) ...@@ -99,45 +100,6 @@ def find_outlier_dims(weight, reduction_dim=0, zscore=4.0, topk=None, rdm=False)
return idx return idx
def replace_linear(model, linear_replacement, skip_modules=["lm_head"], copy_weights=False, post_processing_function=None):
"""
Replace linear modules with a new Linear module.
Parameters:
model (`torch.nn.Module`):
Input model or `torch.nn.Module` as the function is run recursively.
linear_replacement (`torch.nn.Module`):
The linear module that replaces the old one. Only expects standard arguments.
If other arguments need to be passed, use a lambda.
skip_modules (`List[str]`, *optional*, defaults to `lm_head`):
List of modules names not to convert. Defaults to `lm_head`.
copy_weights (`bool`):
Copy the weights from the old linear module to the new one
post_processing_fun_name (`str`):
A function name of the replacement linear class that is called
after processing.
"""
for name, module in model.named_children():
if len(list(module.children())) > 0:
replace_linear(module, linear_replacement, skip_modules, copy_weights, post_processing_function)
if isinstance(module, torch.nn.Linear) and name not in skip_modules:
old_module = model._modules[name]
model._modules[name] = linear_replacement(
module.in_features,
module.out_features,
module.bias is not None,
)
if copy_weights:
model._modules[name].weight = old_module.weight
model._modules[name].bias = old_module.bias
if post_processing_function is not None:
func = getattr(module, post_processing_function, None)
if func is not None: func(module)
return model
def execute_and_return(command_string: str) -> Tuple[str, str]: def execute_and_return(command_string: str) -> Tuple[str, str]:
def _decode(subprocess_err_out_tuple): def _decode(subprocess_err_out_tuple):
...@@ -197,3 +159,36 @@ def replace_linear(model, linear_replacement, skip_modules=["lm_head"], copy_wei ...@@ -197,3 +159,36 @@ def replace_linear(model, linear_replacement, skip_modules=["lm_head"], copy_wei
if func is not None: func(module) if func is not None: func(module)
return model return model
def pack_dict_to_tensor(source_dict):
"""
Pack a dictionary into a torch tensor for storing quant_state items in state_dict.
Parameters:
- source_dict: The dictionary to be packed.
Returns:
A torch tensor containing the packed data.
"""
json_str = json.dumps(source_dict)
json_bytes = json_str.encode('utf-8')
tensor_data = torch.tensor(list(json_bytes), dtype=torch.uint8)
return tensor_data
def unpack_tensor_to_dict(tensor_data):
"""
Unpack a torch tensor into a Python dictionary.
Parameters:
- tensor_data: The torch tensor containing the packed data.
Returns:
A Python dictionary containing the unpacked data.
"""
json_bytes = bytes(tensor_data.numpy())
json_str = json_bytes.decode('utf-8')
unpacked_dict = json.loads(json_str)
return unpacked_dict
...@@ -9,13 +9,13 @@ To run these steps you will need to have the nvcc compiler installed that comes ...@@ -9,13 +9,13 @@ To run these steps you will need to have the nvcc compiler installed that comes
You can install CUDA locally without sudo by following the following steps: You can install CUDA locally without sudo by following the following steps:
```bash ```bash
wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH # Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
# CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121} # CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121}
# EXPORT_TO_BASH in {0, 1} with 0=False and 1=True # EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
# For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc # For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc
bash cuda install 117 ~/local 1 bash install_cuda.sh 117 ~/local 1
``` ```
By default, the Makefile will look at your `CUDA_HOME` environmental variable to find your CUDA version for compiling the library. If this path is not set it is inferred from the path of your `nvcc` compiler. By default, the Makefile will look at your `CUDA_HOME` environmental variable to find your CUDA version for compiling the library. If this path is not set it is inferred from the path of your `nvcc` compiler.
......
name: 8-bit name: bnb
channels: channels:
- conda-forge
- pytorch - pytorch
- nvidia
- conda-forge
dependencies: dependencies:
- python=3.9 # Base
- pytest - conda-forge::python=3.8
- pytorch - pytorch::pytorch=>2.1
- torchaudio - pytorch::pytorch-cuda=11.8
- torchvision - nvidia::cuda=11.8
- cudatoolkit=11.1 # Libraries
- typer - conda-forge::accelerate
- ca-certificates - conda-forge::einops
- certifi - conda-forge::scipy
- openssl - conda-forge::transformers
# Development
- conda-forge::pytest
- conda-forge::build # build Python packages
- conda-forge::twine # upload Python packages
- conda-forge::pytest-cases # more readable and composable parametrized tests
- conda-forge::ipython # better interactive shell
- conda-forge::debugpy # debugger-support for VSCode
- conda-forge::ruff # linting
- conda-forge::yapf # code formatting
- conda-forge::monkeytype # infer type annotations
- conda-forge::rich # better, colored tracebacks, etc
- conda-forge::pytest-sugar # better pytest output
## ENV CREATION - steps to reproduce:
# mamba env remove -n bnb
# mamba create -y -n bnb python=3.8 # creating an empty env bypasses conda
# # and leads to much faster env resolution in the next step https://github.com/mamba-org/mamba/issues/633#issuecomment-812272143
# mamba env update -n bnb -f environment.yml
# mamba activate bnb
## PIP dependencies (install *after* ENV CREATION):
# pip install --no-cache-dir --no-deps lion_pytorch triton peft
## NOTE: conda peft is not up to date, so we install from pip
# cd pip install -e . ## installs bitsandbytes as editable development install from within repo root dir
## ENV UPDATE:
# # add new packages to environment.yml, then:
# mamba env update -n bnb -f environment.yml
\ No newline at end of file
...@@ -15,13 +15,13 @@ where XX.X is the CUDA version number. ...@@ -15,13 +15,13 @@ where XX.X is the CUDA version number.
You can also install CUDA version that you need locally with a script provided by bitsandbytes as follows: You can also install CUDA version that you need locally with a script provided by bitsandbytes as follows:
```bash ```bash
wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/install_cuda.sh
# Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH # Syntax cuda_install CUDA_VERSION INSTALL_PREFIX EXPORT_TO_BASH
# CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121, 122} # CUDA_VERSION in {110, 111, 112, 113, 114, 115, 116, 117, 118, 120, 121, 122}
# EXPORT_TO_BASH in {0, 1} with 0=False and 1=True # EXPORT_TO_BASH in {0, 1} with 0=False and 1=True
# For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc # For example, the following installs CUDA 11.7 to ~/local/cuda-11.7 and exports the path to your .bashrc
bash cuda install 117 ~/local 1 bash install_cuda.sh 117 ~/local 1
``` ```
## Setting the environmental variables BNB_CUDA_VERSION, and LD_LIBRARY_PATH ## Setting the environmental variables BNB_CUDA_VERSION, and LD_LIBRARY_PATH
......
import os
import sys
import subprocess
from urllib.request import urlretrieve
cuda_versions = {
"92": "https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux",
"100": "https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux",
"101": "https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.105_418.39_linux.run",
"102": "https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run",
"110": "https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda_11.0.3_450.51.06_linux.run",
"111": "https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda_11.1.1_455.32.00_linux.run",
"112": "https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run",
"113": "https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run",
"114": "https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda_11.4.4_470.82.01_linux.run",
"115": "https://developer.download.nvidia.com/compute/cuda/11.5.2/local_installers/cuda_11.5.2_495.29.05_linux.run",
"116": "https://developer.download.nvidia.com/compute/cuda/11.6.2/local_installers/cuda_11.6.2_510.47.03_linux.run",
"117": "https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run",
"118": "https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run",
"120": "https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run",
"121": "https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run",
"122": "https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run",
"123": "https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda_12.3.1_545.23.08_linux.run",
}
def install_cuda(version, base_path, download_path):
formatted_version = f"{version[:-1]}.{version[-1]}"
folder = f"cuda-{formatted_version}"
install_path = os.path.join(base_path, folder)
if os.path.exists(install_path):
print(f"Removing existing CUDA version {version} at {install_path}...")
subprocess.run(["rm", "-rf", install_path], check=True)
url = cuda_versions[version]
filename = url.split('/')[-1]
filepath = os.path.join(download_path, filename)
if not os.path.exists(filepath):
print(f"Downloading CUDA version {version} from {url}...")
urlretrieve(url, filepath)
else:
print(f"Installer for CUDA version {version} already downloaded.")
# Make the installer executable
subprocess.run(["chmod", "+x", filepath], check=True)
# Install CUDA
print(f"Installing CUDA version {version}...")
install_command = [
"bash", filepath,
"--no-drm", "--no-man-page", "--override",
"--toolkitpath=" + install_path, "--toolkit", "--silent"
]
print(f"Running command: {' '.join(install_command)}")
try:
subprocess.run(install_command, check=True)
except subprocess.CalledProcessError as e:
print(f"Installation failed for CUDA version {version}: {e}")
return
finally:
# Delete the installer file
os.remove(filepath)
print(f"CUDA version {version} installed at {install_path}")
def main():
user_base_path = os.path.expanduser("~/cuda")
system_base_path = "/usr/local/cuda"
base_path = user_base_path # default to user-specific installation
download_path = "/tmp" # default download path
if len(sys.argv) < 2:
print("Usage: python install_cuda.py <version/all> [user/system] [download_path]")
sys.exit(1)
version = sys.argv[1]
if len(sys.argv) > 2:
base_path = system_base_path if sys.argv[2] == "system" else user_base_path
if len(sys.argv) > 3:
download_path = sys.argv[3]
if not os.path.exists(base_path):
os.makedirs(base_path)
if not os.path.exists(download_path):
os.makedirs(download_path)
# Install CUDA version(s)
if version == "all":
for ver in cuda_versions.keys():
install_cuda(ver, base_path, download_path)
elif version in cuda_versions:
install_cuda(version, base_path, download_path)
else:
print(f"Invalid CUDA version: {version}. Available versions are: {', '.join(cuda_versions.keys())}")
sys.exit(1)
if __name__ == "__main__":
main()
\ No newline at end of file
...@@ -14,6 +14,7 @@ URL118=https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installer ...@@ -14,6 +14,7 @@ URL118=https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installer
URL120=https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run URL120=https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run
URL121=https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run URL121=https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
URL122=https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run URL122=https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run
URL123=https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda_12.3.1_545.23.08_linux.run
CUDA_VERSION=$1 CUDA_VERSION=$1
...@@ -69,11 +70,14 @@ if [[ -n "$CUDA_VERSION" ]]; then ...@@ -69,11 +70,14 @@ if [[ -n "$CUDA_VERSION" ]]; then
elif [[ "$CUDA_VERSION" -eq "122" ]]; then elif [[ "$CUDA_VERSION" -eq "122" ]]; then
URL=$URL122 URL=$URL122
FOLDER=cuda-12.2 FOLDER=cuda-12.2
elif [[ "$CUDA_VERSION" -eq "123" ]]; then
URL=$URL123
FOLDER=cuda-12.3
else else
echo "argument error: No cuda version passed as input. Choose among versions 92 to 121" echo "argument error: No cuda version passed as input. Choose among versions 92 to 123"
fi fi
else else
echo "argument error: No cuda version passed as input. Choose among versions 92 to 112" echo "argument error: No cuda version passed as input. Choose among versions 92 to 123"
fi fi
FILE=$(basename $URL) FILE=$(basename $URL)
......
...@@ -4,3 +4,34 @@ requires = [ ...@@ -4,3 +4,34 @@ requires = [
"wheel" "wheel"
] ]
build-backend = "setuptools.build_meta" build-backend = "setuptools.build_meta"
[tool.ruff]
src = [
"bitsandbytes",
"tests",
"benchmarking"
]
fix = true
select = [
"A", # prevent using keywords that clobber python builtins
"B", # bugbear: security warnings
"E", # pycodestyle
"F", # pyflakes
"I", # isort
"ISC", # implicit string concatenation
"UP", # alert you when better syntax is available in your python version
"RUF", # the ruff developer's own rules
]
target-version = "py38"
ignore = [
"E712", # Allow using if x == False, as it's not always equivalent to if x.
"E501", # Supress line-too-long warnings: trust yapf's judgement on this one.
"F401",
]
ignore-init-module-imports = true # allow to expose in __init__.py via imports
[tool.ruff.isort]
combine-as-imports = true
detect-same-package = true
force-sort-within-sections = true
known-first-party = ["bitsandbytes"]
\ No newline at end of file
[pytest]
addopts = -rP
; --cov=bitsandbytes
; # contexts: record which test ran which line; can be seen in html coverage report
; --cov-context=test
; --cov-report html
log_cli = True
log_cli_level = INFO
log_file = logs/pytest.log
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment