Unverified Commit ac5d6ee6 authored by Steven Liu's avatar Steven Liu Committed by GitHub
Browse files

[docs] implement API docs (#1075)



* optims

* fix path

* fix path

* mdx

* fix path

* toctree

* fix

* optimizer, adagrad

* add init

* add

* more apis

* params

* clarify

* run pre-commit hooks

---------
Co-authored-by: default avatarTitus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
parent 87e029bc
# Lion
[Lion (Evolved Sign Momentum)](https://hf.co/papers/2302.06675) is a unique optimizer that uses the sign of the gradient to determine the update direction of the momentum. This makes Lion more memory-efficient and faster than [`AdamW`] which tracks and store the first and second-order moments.
## Lion[[api-class]]
[[autodoc]] bitsandbytes.optim.Lion
- __init__
## Lion8bit
[[autodoc]] bitsandbytes.optim.Lion8bit
- __init__
## Lion32bit
[[autodoc]] bitsandbytes.optim.Lion32bit
- __init__
## PagedLion
[[autodoc]] bitsandbytes.optim.PagedLion
- __init__
## PagedLion8bit
[[autodoc]] bitsandbytes.optim.PagedLion8bit
- __init__
## PagedLion32bit
[[autodoc]] bitsandbytes.optim.PagedLion32bit
- __init__
# Overview
[8-bit optimizers](https://hf.co/papers/2110.02861) reduce the memory footprint of 32-bit optimizers without any performance degradation which means you can train large models with many parameters faster. At the core of 8-bit optimizers is block-wise quantization which enables quantization accuracy, computational efficiency, and stability.
bitsandbytes provides 8-bit optimizers through the base [`Optimizer8bit`] class, and additionally provides [`Optimizer2State`] and [`Optimizer1State`] for 2-state (for example, [`Adam`]) and 1-state (for example, [`Adagrad`]) optimizers respectively. To provide custom optimizer hyperparameters, use the [`GlobalOptimManager`] class to configure the optimizer.
## Optimizer8bit
[[autodoc]] bitsandbytes.optim.optimizer.Optimizer8bit
- __init__
## Optimizer2State
[[autodoc]] bitsandbytes.optim.optimizer.Optimizer2State
- __init__
## Optimizer1State
[[autodoc]] bitsandbytes.optim.optimizer.Optimizer1State
- __init__
## Utilities
[[autodoc]] bitsandbytes.optim.optimizer.GlobalOptimManager
# RMSprop
RMSprop is an adaptive learning rate optimizer that is very similar to [`Adagrad`]. RMSprop stores a *weighted average* of the squared past gradients for each parameter and uses it to scale their learning rate. This allows the learning rate to be automatically lower or higher depending on the magnitude of the gradient, and it prevents the learning rate from diminishing.
## RMSprop[[api-class]]
[[autodoc]] bitsandbytes.optim.RMSprop
## RMSprop8bit
[[autodoc]] bitsandbytes.optim.RMSprop8bit
## RMSprop32bit
[[autodoc]] bitsandbytes.optim.RMSprop32bit
# SGD
Stochastic gradient descent (SGD) is a basic gradient descent optimizer to minimize loss given a set of model parameters and updates the parameters in the opposite direction of the gradient. The update is performed on a randomly sampled mini-batch of data from the dataset.
bitsandbytes also supports momentum and Nesterov momentum to accelerate SGD by adding a weighted average of past gradients to the current gradient.
## SGD[[api-class]]
[[autodoc]] bitsandbytes.optim.SGD
- __init__
## SGD8bit
[[autodoc]] bitsandbytes.optim.SGD8bit
- __init__
## SGD32bit
[[autodoc]] bitsandbytes.optim.SGD32bit
- __init__
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment