""" Implements the LAMB algorithm (Layer-wise Adaptive Moments optimizer for Batch training).
Adapted from the huggingface/transformers ADAM optimizer
Inspired from the Google Research implementation available in ALBERT: https://github.com/google-research/google-research/blob/master/albert/lamb_optimizer.py
Inspired from cybertronai's PyTorch LAMB implementation: https://github.com/cybertronai/pytorch-lamb/blob/master/pytorch_lamb/lamb.py