"git@developer.sourcefind.cn:chenpangpang/ComfyUI.git" did not exist on "0be92710b4e11cdccfc30a0ea0bf107917826fc5"
Unverified Commit 56a77457 authored by NielsRogge's avatar NielsRogge Committed by GitHub
Browse files

[Chameleon, Hiera] Improve docs (#32038)

* Improve docs

* Fix docs

* Fix code snippet
parent b873234c
...@@ -326,8 +326,6 @@ ...@@ -326,8 +326,6 @@
title: CamemBERT title: CamemBERT
- local: model_doc/canine - local: model_doc/canine
title: CANINE title: CANINE
- local: model_doc/chameleon
title: chameleon
- local: model_doc/codegen - local: model_doc/codegen
title: CodeGen title: CodeGen
- local: model_doc/code_llama - local: model_doc/code_llama
...@@ -760,6 +758,8 @@ ...@@ -760,6 +758,8 @@
title: BridgeTower title: BridgeTower
- local: model_doc/bros - local: model_doc/bros
title: BROS title: BROS
- local: model_doc/chameleon
title: chameleon
- local: model_doc/chinese_clip - local: model_doc/chinese_clip
title: Chinese-CLIP title: Chinese-CLIP
- local: model_doc/clip - local: model_doc/clip
......
...@@ -69,13 +69,13 @@ import torch ...@@ -69,13 +69,13 @@ import torch
from PIL import Image from PIL import Image
import requests import requests
processor = ChameleonProcessor.from_pretrained("meta-chameleon") processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b")
model = ChameleonForConditionalGeneration.from_pretrained("meta-chameleon", torch_dtype=torch.float16, device_map="auto") model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="auto")
# prepare image and text prompt # prepare image and text prompt
url = "https://bjiujitsu.com/wp-content/uploads/2021/01/jiu_jitsu_belt_white_1.jpg" url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw) image = Image.open(requests.get(url, stream=True).raw)
prompt = "What color is the belt in this image?<image>" prompt = "What do you see in this image?<image>"
inputs = processor(prompt, image, return_tensors="pt").to(model.device) inputs = processor(prompt, image, return_tensors="pt").to(model.device)
...@@ -94,8 +94,8 @@ import torch ...@@ -94,8 +94,8 @@ import torch
from PIL import Image from PIL import Image
import requests import requests
processor = ChameleonProcessor.from_pretrained("meta-chameleon") processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b")
model = ChameleonForConditionalGeneration.from_pretrained("meta-chameleon", torch_dtype=torch.float16, device_map="auto") model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="auto")
# Get three different images # Get three different images
url = "https://www.ilankelman.org/stopsigns/australia.jpg" url = "https://www.ilankelman.org/stopsigns/australia.jpg"
...@@ -115,7 +115,7 @@ prompts = [ ...@@ -115,7 +115,7 @@ prompts = [
# We can simply feed images in the order they have to be used in the text prompt # We can simply feed images in the order they have to be used in the text prompt
# Each "<image>" token uses one image leaving the next for the subsequent "<image>" tokens # Each "<image>" token uses one image leaving the next for the subsequent "<image>" tokens
inputs = processor(text=prompts, images=[image_stop, image_cats, image_snowman], padding=True, return_tensors="pt").to(model.device) inputs = processor(text=prompts, images=[image_stop, image_cats, image_snowman], padding=True, return_tensors="pt").to(device="cuda", dtype=torch.float16)
# Generate # Generate
generate_ids = model.generate(**inputs, max_new_tokens=50) generate_ids = model.generate(**inputs, max_new_tokens=50)
......
...@@ -57,7 +57,7 @@ print((last_hidden_states - traced_outputs[0]).abs().max()) ...@@ -57,7 +57,7 @@ print((last_hidden_states - traced_outputs[0]).abs().max())
## Resources ## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DPT. A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DINOv2.
- Demo notebooks for DINOv2 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DINOv2). 🌎 - Demo notebooks for DINOv2 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DINOv2). 🌎
......
...@@ -26,8 +26,22 @@ The abstract from the paper is the following: ...@@ -26,8 +26,22 @@ The abstract from the paper is the following:
*Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraining with a strong visual pretext task (MAE), we can strip out all the bells-and-whistles from a state-of-the-art multi-stage vision transformer without losing accuracy. In the process, we create Hiera, an extremely simple hierarchical vision transformer that is more accurate than previous models while being significantly faster both at inference and during training. We evaluate Hiera on a variety of tasks for image and video recognition. Our code and models are available at https://github.com/facebookresearch/hiera.* *Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraining with a strong visual pretext task (MAE), we can strip out all the bells-and-whistles from a state-of-the-art multi-stage vision transformer without losing accuracy. In the process, we create Hiera, an extremely simple hierarchical vision transformer that is more accurate than previous models while being significantly faster both at inference and during training. We evaluate Hiera on a variety of tasks for image and video recognition. Our code and models are available at https://github.com/facebookresearch/hiera.*
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/hiera_overview.png"
alt="drawing" width="600"/>
<small> Hiera architecture. Taken from the <a href="https://arxiv.org/abs/2306.00989">original paper.</a> </small>
This model was a joint contibution by [EduardoPacheco](https://huggingface.co/EduardoPacheco) and [namangarg110](https://huggingface.co/namangarg110). The original code can be found [here] (https://github.com/facebookresearch/hiera). This model was a joint contibution by [EduardoPacheco](https://huggingface.co/EduardoPacheco) and [namangarg110](https://huggingface.co/namangarg110). The original code can be found [here] (https://github.com/facebookresearch/hiera).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Hiera. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
<PipelineTag pipeline="image-classification"/>
- [`HieraForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
- See also: [Image classification task guide](../tasks/image_classification)
## HieraConfig ## HieraConfig
[[autodoc]] HieraConfig [[autodoc]] HieraConfig
......
...@@ -444,7 +444,7 @@ def main(): ...@@ -444,7 +444,7 @@ def main():
"--model_size", "--model_size",
choices=["7B", "30B"], choices=["7B", "30B"],
help="" help=""
" models correspond to the finetuned versions, and are specific to the Chameleon official release. For more details on Chameleon, checkout the original repo: https://huggingface.co/meta-chameleon", " models correspond to the finetuned versions, and are specific to the Chameleon official release. For more details on Chameleon, checkout the original repo: https://github.com/facebookresearch/chameleon",
) )
parser.add_argument( parser.add_argument(
"--output_dir", "--output_dir",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment