Unverified Commit 198d08fc authored by Matthew Douglas's avatar Matthew Douglas Committed by GitHub
Browse files

Documentation Cleanup (#1644)

* Start cleaning up docs

* Remove page

* Minor update

* correction

* Minor doc revisions

* Update installation.mdx

* Update _toctree.yml
parent 9f858294
...@@ -2,18 +2,15 @@ ...@@ -2,18 +2,15 @@
sections: sections:
- local: index - local: index
title: bitsandbytes title: bitsandbytes
- local: quickstart
title: Quickstart
- local: installation - local: installation
title: Installation title: Installation
- title: Guides - local: quickstart
title: Quickstart
- title: Usage Guides
sections: sections:
- local: optimizers - local: optimizers
title: 8-bit optimizers title: 8-bit optimizers
- local: algorithms
title: Algorithms
- local: non_cuda_backends
title: Non-CUDA compute backends
- local: fsdp_qlora - local: fsdp_qlora
title: FSDP-QLoRA title: FSDP-QLoRA
- local: integrations - local: integrations
...@@ -56,7 +53,7 @@ ...@@ -56,7 +53,7 @@
title: RMSprop title: RMSprop
- local: reference/optim/sgd - local: reference/optim/sgd
title: SGD title: SGD
- title: k-bit quantizers - title: Modules
sections: sections:
- local: reference/nn/linear8bit - local: reference/nn/linear8bit
title: LLM.int8() title: LLM.int8()
......
# Other algorithms
_WIP: Still incomplete... Community contributions would be greatly welcome!_
This is an overview of the `bnb.functional` API in `bitsandbytes` that we think would also be useful as standalone entities.
## Using Int8 Matrix Multiplication
For straight Int8 matrix multiplication without mixed precision decomposition you can use ``bnb.matmul(...)``. To enable mixed precision decomposition, use the threshold parameter:
```py
bnb.matmul(..., threshold=6.0)
```
# Contributors guidelines # Contribution Guide
... still under construction ... (feel free to propose materials, `bitsandbytes` is a community project)
## Setup ## Setup
......
...@@ -3,5 +3,3 @@ ...@@ -3,5 +3,3 @@
Please submit your questions in [this Github Discussion thread](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1013) if you feel that they will likely affect a lot of other users and that they haven't been sufficiently covered in the documentation. Please submit your questions in [this Github Discussion thread](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1013) if you feel that they will likely affect a lot of other users and that they haven't been sufficiently covered in the documentation.
We'll pick the most generally applicable ones and post the QAs here or integrate them into the general documentation (also feel free to submit doc PRs, please). We'll pick the most generally applicable ones and post the QAs here or integrate them into the general documentation (also feel free to submit doc PRs, please).
# ... under construction ...
This diff is collapsed.
# Multi-backend support (non-CUDA backends)
> [!Tip]
> If you feel these docs need some additional info, please consider submitting a PR or respectfully request the missing info in one of the below mentioned Github discussion spaces.
As part of a recent refactoring effort, we will soon offer official multi-backend support. Currently, this feature is available in a preview alpha release, allowing us to gather early feedback from users to improve the functionality and identify any bugs.
At present, the Intel CPU and AMD ROCm backends are considered fully functional. The Intel XPU backend has limited functionality and is less mature.
Please refer to the [installation instructions](./installation#multi-backend) for details on installing the backend you intend to test (and hopefully provide feedback on).
> [!Tip]
> Apple Silicon support is planned for Q4 2024. We are actively seeking contributors to help implement this, develop a concrete plan, and create a detailed list of requirements. Due to limited resources, we rely on community contributions for this implementation effort. To discuss further, please spell out your thoughts and discuss in [this GitHub discussion](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1340) and tag `@Titus-von-Koeller` and `@matthewdouglas`. Thank you!
## Alpha Release
As we are currently in the alpha testing phase, bugs are expected, and performance might not meet expectations. However, this is exactly what we want to discover from **your** perspective as the end user!
Please share and discuss your feedback with us here:
- [Github Discussion: Multi-backend refactor: Alpha release ( AMD ROCm ONLY )](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1339)
- [Github Discussion: Multi-backend refactor: Alpha release ( Intel ONLY )](https://github.com/bitsandbytes-foundation/bitsandbytes/discussions/1338)
Thank you for your support!
## Benchmarks
### Intel
The following performance data is collected from Intel 4th Gen Xeon (SPR) platform. The tables show speed-up and memory compared with different data types of [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
#### Inference (CPU)
| Data Type | BF16 | INT8 | NF4 | FP4 |
|---|---|---|---|---|
| Speed-Up (vs BF16) | 1.0x | 0.6x | 2.3x | 0.03x |
| Memory (GB) | 13.1 | 7.6 | 5.0 | 4.6 |
#### Fine-Tuning (CPU)
| Data Type | AMP BF16 | INT8 | NF4 | FP4 |
|---|---|---|---|---|
| Speed-Up (vs AMP BF16) | 1.0x | 0.38x | 0.07x | 0.07x |
| Memory (GB) | 40 | 9 | 6.6 | 6.6 |
...@@ -9,8 +9,6 @@ The `bitsandbytes.functional` API provides the low-level building blocks for the ...@@ -9,8 +9,6 @@ The `bitsandbytes.functional` API provides the low-level building blocks for the
* For experimental or research purposes requiring non-standard quantization or performance optimizations. * For experimental or research purposes requiring non-standard quantization or performance optimizations.
## LLM.int8() ## LLM.int8()
[[autodoc]] functional.int8_double_quant
[[autodoc]] functional.int8_linear_matmul [[autodoc]] functional.int8_linear_matmul
[[autodoc]] functional.int8_mm_dequant [[autodoc]] functional.int8_mm_dequant
...@@ -19,7 +17,6 @@ The `bitsandbytes.functional` API provides the low-level building blocks for the ...@@ -19,7 +17,6 @@ The `bitsandbytes.functional` API provides the low-level building blocks for the
[[autodoc]] functional.int8_vectorwise_quant [[autodoc]] functional.int8_vectorwise_quant
## 4-bit ## 4-bit
[[autodoc]] functional.dequantize_4bit [[autodoc]] functional.dequantize_4bit
...@@ -49,5 +46,3 @@ For more details see [8-Bit Approximations for Parallelism in Deep Learning](htt ...@@ -49,5 +46,3 @@ For more details see [8-Bit Approximations for Parallelism in Deep Learning](htt
## Utility ## Utility
[[autodoc]] functional.get_ptr [[autodoc]] functional.get_ptr
[[autodoc]] functional.is_on_gpu
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment