*This model was released on 2022-09-08 and added to Hugging Face Transformers on 2023-03-13.* # MGP-STR
MGP-STR architecture. Taken from the original paper.
MGP-STR is trained on two synthetic datasets [MJSynth](http://www.robots.ox.ac.uk/~vgg/data/text/) (MJ) and [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) (ST) without fine-tuning on other datasets. It achieves state-of-the-art results on six standard Latin scene text benchmarks, including 3 regular text datasets (IC13, SVT, IIIT) and 3 irregular ones (IC15, SVTP, CUTE).
This model was contributed by [yuekun](https://huggingface.co/yuekun). The original code can be found [here](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/MGP-STR).
## Inference example
[`MgpstrModel`] accepts images as input and generates three types of predictions, which represent textual information at different granularities.
The three types of predictions are fused to give the final prediction result.
The [`ViTImageProcessor`] class is responsible for preprocessing the input image and
[`MgpstrTokenizer`] decodes the generated character tokens to the target string. The
[`MgpstrProcessor`] wraps [`ViTImageProcessor`] and [`MgpstrTokenizer`]
into a single instance to both extract the input features and decode the predicted token ids.
- Step-by-step Optical Character Recognition (OCR)
```py
>>> from transformers import MgpstrProcessor, MgpstrForSceneTextRecognition
>>> import requests
>>> from PIL import Image
>>> processor = MgpstrProcessor.from_pretrained('alibaba-damo/mgp-str-base')
>>> model = MgpstrForSceneTextRecognition.from_pretrained('alibaba-damo/mgp-str-base')
>>> # load image from the IIIT-5k dataset
>>> url = "https://i.postimg.cc/ZKwLg2Gw/367-14.png"
>>> image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
>>> pixel_values = processor(images=image, return_tensors="pt").pixel_values
>>> outputs = model(pixel_values)
>>> generated_text = processor.batch_decode(outputs.logits)['generated_text']
```
## MgpstrConfig
[[autodoc]] MgpstrConfig
## MgpstrTokenizer
[[autodoc]] MgpstrTokenizer
- save_vocabulary
## MgpstrProcessor
[[autodoc]] MgpstrProcessor
- __call__
- batch_decode
## MgpstrModel
[[autodoc]] MgpstrModel
- forward
## MgpstrForSceneTextRecognition
[[autodoc]] MgpstrForSceneTextRecognition
- forward