The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes`` and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with:
```python
from transformers import LlavaNextForConditionalGeneration, BitsandBytesConfig
from transformers import LlavaNextForConditionalGeneration, BitsAndBytesConfig