"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "4aa630eeabf588014c0fb0d09f7bb0fa7c2d71e5"
Unverified Commit 5d997f22 authored by Merve Noyan's avatar Merve Noyan Committed by GitHub
Browse files

Image-to-Image Task Guide (#26595)



* img2img task guide

* Update year

* Add to toctree

* Update docs/source/en/tasks/image_to_image.md
Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/tasks/image_to_image.md
Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/tasks/image_to_image.md
Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/tasks/image_to_image.md
Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/tasks/image_to_image.md
Co-authored-by: default avatarMaria Khalusova <kafooster@gmail.com>

* Update docs/source/en/tasks/image_to_image.md
Co-authored-by: default avatarMaria Khalusova <kafooster@gmail.com>

* Addressed comments

* Update docs/source/en/tasks/image_to_image.md
Co-authored-by: default avatarMaria Khalusova <kafooster@gmail.com>

* Addressed comments

---------
Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: default avatarMaria Khalusova <kafooster@gmail.com>
parent 5c081e29
...@@ -71,6 +71,8 @@ ...@@ -71,6 +71,8 @@
title: Zero-shot image classification title: Zero-shot image classification
- local: tasks/monocular_depth_estimation - local: tasks/monocular_depth_estimation
title: Depth estimation title: Depth estimation
- local: tasks/image_to_image
title: Image-to-Image
title: Computer Vision title: Computer Vision
- isExpanded: false - isExpanded: false
sections: sections:
......
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Image-to-Image Task Guide
[[open-in-colab]]
Image-to-Image task is the task where an application receives an image and outputs another image. This has various subtasks, including image enhancement (super resolution, low light enhancement, deraining and so on), image inpainting, and more.
This guide will show you how to:
- Use an image-to-image pipeline for super resolution task,
- Run image-to-image models for same task without a pipeline.
Note that as of the time this guide is released, `image-to-image` pipeline only supports super resolution task.
Let's begin by installing the necessary libraries.
```bash
pip install transformers
```
We can now initialize the pipeline with a [Swin2SR model](https://huggingface.co/caidas/swin2SR-lightweight-x2-64). We can then infer with the pipeline by calling it with an image. As of now, only [Swin2SR models](https://huggingface.co/models?sort=trending&search=swin2sr) are supported in this pipeline.
```python
from transformers import pipeline
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
pipe = pipeline(task="image-to-image", model="caidas/swin2SR-lightweight-x2-64", device=device)
```
Now, let's load an image.
```python
from PIL import Image
import requests
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/cat.jpg"
image = Image.open(requests.get(url, stream=True).raw)
print(image.size)
```
```bash
# (532, 432)
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/cat.jpg" alt="Photo of a cat"/>
</div>
We can now do inference with the pipeline. We will get an upscaled version of the cat image.
```python
upscaled = pipe(image)
print(upscaled.size)
```
```bash
# (1072, 880)
```
If you wish to do inference yourself with no pipeline, you can use the `Swin2SRForImageSuperResolution` and `Swin2SRImageProcessor` classes of transformers. We will use the same model checkpoint for this. Let's initialize the model and the processor.
```python
from transformers import Swin2SRForImageSuperResolution, Swin2SRImageProcessor
model = Swin2SRForImageSuperResolution.from_pretrained("caidas/swin2SR-lightweight-x2-64").to(device)
processor = Swin2SRImageProcessor("caidas/swin2SR-lightweight-x2-64")
```
`pipeline` abstracts away the preprocessing and postprocessing steps that we have to do ourselves, so let's preprocess the image. We will pass the image to the processor and then move the pixel values to GPU.
```python
pixel_values = processor(image, return_tensors="pt").pixel_values
print(pixel_values.shape)
pixel_values = pixel_values.to(device)
```
We can now infer the image by passing pixel values to the model.
```python
import torch
with torch.no_grad():
outputs = model(pixel_values)
```
Output is an object of type `ImageSuperResolutionOutput` that looks like below 👇
```
(loss=None, reconstruction=tensor([[[[0.8270, 0.8269, 0.8275, ..., 0.7463, 0.7446, 0.7453],
[0.8287, 0.8278, 0.8283, ..., 0.7451, 0.7448, 0.7457],
[0.8280, 0.8273, 0.8269, ..., 0.7447, 0.7446, 0.7452],
...,
[0.5923, 0.5933, 0.5924, ..., 0.0697, 0.0695, 0.0706],
[0.5926, 0.5932, 0.5926, ..., 0.0673, 0.0687, 0.0705],
[0.5927, 0.5914, 0.5922, ..., 0.0664, 0.0694, 0.0718]]]],
device='cuda:0'), hidden_states=None, attentions=None)
```
We need to get the `reconstruction` and post-process it for visualization. Let's see how it looks like.
```python
outputs.reconstruction.data.shape
# torch.Size([1, 3, 880, 1072])
```
We need to squeeze the output and get rid of axis 0, clip the values, then convert it to be numpy float. Then we will arrange axes to have the shape [1072, 880], and finally, bring the output back to range [0, 255].
```python
import numpy as np
# squeeze, take to CPU and clip the values
output = outputs.reconstruction.data.squeeze().cpu().clamp_(0, 1).numpy()
# rearrange the axes
output = np.moveaxis(output, source=0, destination=-1)
# bring values back to pixel values range
output = (output * 255.0).round().astype(np.uint8)
Image.fromarray(output)
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/cat_upscaled.png" alt="Upscaled photo of a cat"/>
</div>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment