adam.mdx

# Adam

[Adam (Adaptive moment estimation)](https://hf.co/papers/1412.6980) is an adaptive learning rate optimizer, combining ideas from [`SGD`] with momentum and [`RMSprop`] to automatically scale the learning rate:

- a weighted average of the past gradients to provide direction (first-moment)
- a weighted average of the *squared* past gradients to adapt the learning rate to each parameter (second-moment)

bitsandbytes also supports paged optimizers which take advantage of CUDAs unified memory to transfer memory from the GPU to the CPU when GPU memory is exhausted.

## Adam[[api-class]]

[[autodoc]] bitsandbytes.optim.Adam
    - __init__

## Adam8bit

[[autodoc]] bitsandbytes.optim.Adam8bit
    - __init__

## Adam32bit

[[autodoc]] bitsandbytes.optim.Adam32bit
    - __init__

## PagedAdam

[[autodoc]] bitsandbytes.optim.PagedAdam
    - __init__

## PagedAdam8bit

[[autodoc]] bitsandbytes.optim.PagedAdam8bit
    - __init__

## PagedAdam32bit

[[autodoc]] bitsandbytes.optim.PagedAdam32bit
    - __init__