Unverified Commit 4bd8f129 authored by Merve Noyan's avatar Merve Noyan Committed by GitHub
Browse files

Fixes to chameleon docs (#32078)

* Fixes

* Let's not use auto
parent 566b0f1f
...@@ -34,13 +34,13 @@ being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs ...@@ -34,13 +34,13 @@ being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs
generation, all in a single model. It also matches or exceeds the performance of much larger models, generation, all in a single model. It also matches or exceeds the performance of much larger models,
including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal
generation evaluation, where either the prompt or outputs contain mixed sequences of both images and generation evaluation, where either the prompt or outputs contain mixed sequences of both images and
text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents* text. Chameleon marks a significant step forward in unified modeling of full multimodal documents*
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/chameleon_arch.png" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/chameleon_arch.png"
alt="drawing" width="600"/> alt="drawing" width="600"/>
<small> Chameleon incorporates a vector quantizer module to transform images into discrete tokens. That also enables image geenration using an auto-regressive transformer. Taken from the <a href="https://arxiv.org/abs/2405.09818v1">original paper.</a> </small> <small> Chameleon incorporates a vector quantizer module to transform images into discrete tokens. That also enables image generation using an auto-regressive transformer. Taken from the <a href="https://arxiv.org/abs/2405.09818v1">original paper.</a> </small>
This model was contributed by [joaogante](https://huggingface.co/joaogante) and [RaushanTurganbay](https://huggingface.co/RaushanTurganbay). This model was contributed by [joaogante](https://huggingface.co/joaogante) and [RaushanTurganbay](https://huggingface.co/RaushanTurganbay).
The original code can be found [here](https://github.com/facebookresearch/chameleon). The original code can be found [here](https://github.com/facebookresearch/chameleon).
...@@ -61,6 +61,7 @@ The original code can be found [here](https://github.com/facebookresearch/chamel ...@@ -61,6 +61,7 @@ The original code can be found [here](https://github.com/facebookresearch/chamel
### Single image inference ### Single image inference
Chameleon is a gated model so make sure to have access and login to Hugging Face Hub using a token.
Here's how to load the model and perform inference in half-precision (`torch.float16`): Here's how to load the model and perform inference in half-precision (`torch.float16`):
```python ```python
...@@ -70,7 +71,7 @@ from PIL import Image ...@@ -70,7 +71,7 @@ from PIL import Image
import requests import requests
processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b") processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b")
model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="auto") model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="cuda")
# prepare image and text prompt # prepare image and text prompt
url = 'http://images.cocodataset.org/val2017/000000039769.jpg' url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
...@@ -95,7 +96,8 @@ from PIL import Image ...@@ -95,7 +96,8 @@ from PIL import Image
import requests import requests
processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b") processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b")
model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="auto")
model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="cuda")
# Get three different images # Get three different images
url = "https://www.ilankelman.org/stopsigns/australia.jpg" url = "https://www.ilankelman.org/stopsigns/australia.jpg"
...@@ -138,7 +140,7 @@ quantization_config = BitsAndBytesConfig( ...@@ -138,7 +140,7 @@ quantization_config = BitsAndBytesConfig(
bnb_4bit_compute_dtype=torch.float16, bnb_4bit_compute_dtype=torch.float16,
) )
model = ChameleonForConditionalGeneration.from_pretrained("meta-chameleon", quantization_config=quantization_config, device_map="auto") model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", quantization_config=quantization_config, device_map="cuda")
``` ```
### Use Flash-Attention 2 and SDPA to further speed-up generation ### Use Flash-Attention 2 and SDPA to further speed-up generation
...@@ -148,6 +150,7 @@ The models supports both, Flash-Attention 2 and PyTorch's [`torch.nn.functional. ...@@ -148,6 +150,7 @@ The models supports both, Flash-Attention 2 and PyTorch's [`torch.nn.functional.
```python ```python
from transformers import ChameleonForConditionalGeneration from transformers import ChameleonForConditionalGeneration
model_id = "facebook/chameleon-7b"
model = ChameleonForConditionalGeneration.from_pretrained( model = ChameleonForConditionalGeneration.from_pretrained(
model_id, model_id,
torch_dtype=torch.float16, torch_dtype=torch.float16,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment