[docs] implement API docs (#1075)

* optims * fix path * fix path * mdx * fix path * toctree * fix * optimizer, adagrad * add init * add * more apis * params * clarify * run pre-commit hooks --------- Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>

[docs] implement API docs (#1075)
* optims * fix path * fix path * mdx * fix path * toctree * fix * optimizer, adagrad * add init * add * more apis * params * clarify * run pre-commit hooks --------- Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
ac5d6ee6 · Steven Liu · GitHub · 87e029bc · ac5d6ee6 · ac5d6ee6
Unverified Commit ac5d6ee6 authored Mar 07, 2024 by Steven Liu Committed by GitHub Mar 07, 2024
5 changed files
--- a/docs/source/reference/optim/lion.mdx
+++ b/docs/source/reference/optim/lion.mdx
+# Lion
+
+[Lion (Evolved Sign Momentum)](https://hf.co/papers/2302.06675) is a unique optimizer that uses the sign of the gradient to determine the update direction of the momentum. This makes Lion more memory-efficient and faster than [`AdamW`] which tracks and store the first and second-order moments.
+
+## Lion[[api-class]]
+
+[[autodoc]] bitsandbytes.optim.Lion
+    - __init__
+
+## Lion8bit
+
+[[autodoc]] bitsandbytes.optim.Lion8bit
+    - __init__
+
+## Lion32bit
+
+[[autodoc]] bitsandbytes.optim.Lion32bit
+    - __init__
+
+## PagedLion
+
+[[autodoc]] bitsandbytes.optim.PagedLion
+    - __init__
+
+## PagedLion8bit
+
+[[autodoc]] bitsandbytes.optim.PagedLion8bit
+    - __init__
+
+## PagedLion32bit
+
+[[autodoc]] bitsandbytes.optim.PagedLion32bit
+    - __init__
--- a/docs/source/reference/optim/optim_overview.mdx
+++ b/docs/source/reference/optim/optim_overview.mdx
+# Overview
+
+[8-bit optimizers](https://hf.co/papers/2110.02861) reduce the memory footprint of 32-bit optimizers without any performance degradation which means you can train large models with many parameters faster. At the core of 8-bit optimizers is block-wise quantization which enables quantization accuracy, computational efficiency, and stability.
+
+bitsandbytes provides 8-bit optimizers through the base [`Optimizer8bit`] class, and additionally provides [`Optimizer2State`] and [`Optimizer1State`] for 2-state (for example, [`Adam`]) and 1-state (for example, [`Adagrad`]) optimizers respectively. To provide custom optimizer hyperparameters, use the [`GlobalOptimManager`] class to configure the optimizer.
+
+## Optimizer8bit
+
+[[autodoc]] bitsandbytes.optim.optimizer.Optimizer8bit
+    - __init__
+
+## Optimizer2State
+
+[[autodoc]] bitsandbytes.optim.optimizer.Optimizer2State
+    - __init__
+
+## Optimizer1State
+
+[[autodoc]] bitsandbytes.optim.optimizer.Optimizer1State
+    - __init__
+
+## Utilities
+
+[[autodoc]] bitsandbytes.optim.optimizer.GlobalOptimManager
--- a/docs/source/reference/optim/rmsprop.mdx
+++ b/docs/source/reference/optim/rmsprop.mdx
+# RMSprop
+
+RMSprop is an adaptive learning rate optimizer that is very similar to [`Adagrad`]. RMSprop stores a *weighted average* of the squared past gradients for each parameter and uses it to scale their learning rate. This allows the learning rate to be automatically lower or higher depending on the magnitude of the gradient, and it prevents the learning rate from diminishing.
+
+## RMSprop[[api-class]]
+
+[[autodoc]] bitsandbytes.optim.RMSprop
+
+## RMSprop8bit
+
+[[autodoc]] bitsandbytes.optim.RMSprop8bit
+
+## RMSprop32bit
+
+[[autodoc]] bitsandbytes.optim.RMSprop32bit
--- a/docs/source/reference/optim/sgd.mdx
+++ b/docs/source/reference/optim/sgd.mdx
+# SGD
+
+Stochastic gradient descent (SGD) is a basic gradient descent optimizer to minimize loss given a set of model parameters and updates the parameters in the opposite direction of the gradient. The update is performed on a randomly sampled mini-batch of data from the dataset.
+
+bitsandbytes also supports momentum and Nesterov momentum to accelerate SGD by adding a weighted average of past gradients to the current gradient.
+
+## SGD[[api-class]]
+
+[[autodoc]] bitsandbytes.optim.SGD
+    - __init__
+
+## SGD8bit
+
+[[autodoc]] bitsandbytes.optim.SGD8bit
+    - __init__
+
+## SGD32bit
+
+[[autodoc]] bitsandbytes.optim.SGD32bit
+    - __init__
--- a/docs/source/quantization.mdx
+++ b/docs/source/quantization.mdx