- Added Adagrad (without grad clipping) as 32-bit and 8-bit block-wise optimizer.
- Added Adagrad (without grad clipping) as 32-bit and 8-bit block-wise optimizer.
- Added AdamW (copy of Adam with weight decay init 1e-2). #10
- Added AdamW (copy of Adam with weight decay init 1e-2). #10
- Introduced ModuleConfig overrides which can be seamlessly be used at initialization time of a module.
- Added `bnb.nn.Embedding` layer which runs at 32-bit but without the layernorm. This works well if you need to fine-tune pretrained models that do not have a embedding layer norm. #19
Bug fixes:
Bug fixes:
- Fixed a bug where weight decay was incorrectly applied to 32-bit Adam. #13
- Fixed a bug where weight decay was incorrectly applied to 32-bit Adam. #13
- Fixed an unsafe use of eval. #8
- Fixed an unsafe use of eval. #8
- Fixed a bug where the StableEmbedding layer 32-bit optimizer override would not work without registering the whole model first (`bnb.optim.GlobalOptimManager.get_instance().register_parameters(model.parameters())`). #13 #15
Docs:
- Added instructions how to solve "\_\_fatbinwrap_" errors.
@@ -6,3 +6,16 @@ If you are feeling lucky, you can also try to compile the library from source. T
...
@@ -6,3 +6,16 @@ If you are feeling lucky, you can also try to compile the library from source. T
__If you encounter any other error not listed here please create an issue. This will help resolve your problem and will help out others in the future.
__If you encounter any other error not listed here please create an issue. This will help resolve your problem and will help out others in the future.
# fatbinwrap
This error occurs if there is a mismatch between CUDA versions in the C++ library and the CUDA part. Make sure you have right CUDA in your $PATH and $LD_LIBRARY_PATH variable. In the conda base environment you can find the library under:
```bash
ls$CONDA_PREFIX/lib/*cudart*
```
Make sure this path is appended to the `LD_LIBRARY_PATH` so bnb can find the CUDA runtime environment library (cudart).
If this does not fix the issue, please try [compilation from source](compile_from_source.md) next.
If this does not work, please open an issue and paste the printed environment if you call `make` and the associated error when running bnb.
If you want to optimize some unstable parameters with 32-bit Adam and others with 8-bit Adam, you can use the `GlobalOptimManager`. With this, we can also configure specific hyperparameters for particular layers, such as embedding layers. To do that, we need two things: (1) register the parameter while they are still on the CPU, (2) override the config with the new desired hyperparameters (anytime, anywhere). See our [guide](howto_config_override.md) for more details
If you want to optimize some unstable parameters with 32-bit Adam and others with 8-bit Adam, you can use the `GlobalOptimManager`. With this, we can also configure specific hyperparameters for particular layers, such as embedding layers. To do that, we need two things: (1) register the parameter while they are still on the CPU, (2) override the config with the new desired hyperparameters (anytime, anywhere). See our [guide](howto_config_override.md) for more details
For global overrides in many different places in your code you can do:
Possible options for the config override are: `betas, eps, weight_decay, lr, optim_bits, min_8bit_size, percentile_clipping, block_wise, max_unorm`
Possible options for the config override are: `betas, eps, weight_decay, lr, optim_bits, min_8bit_size, percentile_clipping, block_wise, max_unorm`
For overrides for particular layers we recommend overriding locally in each module. You can do this by passing the module, the parameter, and its attribute name to the GlobalOptimManager:
```python
classMyModule(torch.nn.Module):
def__init__(din,dout):
super(MyModule,self).__init__()
self.linear=torch.nn.Linear(din,dout)
# optimization will happen in 32-bit and
# learning rate will be set to 0.0001 independent of the main learning rate