adamw.mdx 871 Bytes
Newer Older
Steven Liu's avatar
Steven Liu committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# AdamW

[AdamW](https://hf.co/papers/1711.05101) is a variant of the [`Adam`] optimizer that separates weight decay from the gradient update based on the observation that the weight decay formulation is different when applied to [`SGD`] and [`Adam`].

bitsandbytes also supports paged optimizers which take advantage of CUDAs unified memory to transfer memory from the GPU to the CPU when GPU memory is exhausted.

## AdamW[[api-class]]

[[autodoc]] bitsandbytes.optim.AdamW
    - __init__

## AdamW8bit

[[autodoc]] bitsandbytes.optim.AdamW8bit
    - __init__

## AdamW32bit

[[autodoc]] bitsandbytes.optim.AdamW32bit
    - __init__

## PagedAdamW

[[autodoc]] bitsandbytes.optim.PagedAdamW
    - __init__
## PagedAdamW8bit

[[autodoc]] bitsandbytes.optim.PagedAdamW8bit
    - __init__

## PagedAdamW32bit

[[autodoc]] bitsandbytes.optim.PagedAdamW32bit
    - __init__