Simple OCR annotation tool using Jupyter Notebook
STR (Scene Text Recognition), or as it is widely known, OCR (Optical Character Recognition), consists of two main steps 1- Text detection and 2- Text recognition. Text detection is the step in which we try to localize the text in an image, and the second step is to recognize what is the text in that area that the detection step has determined. In the literature, text detection considers an object detection task where the annotation requires bounding boxes around the text instances. The text recognition task requires the text label (characters) within an image crop.
In this blog, we will demonstrate a simple way to annotate images for STR. Since it is a simple way, it should not be used as an optimal solution instead, it should be used for POC or to annotate a small number of images. However, there are many advanced tools and solutions for a variety of tasks, object detection, key-point, image captioning, etc.
One of the most efficient ways to transcribe an image (OCR annotation) is to see the image and immediately transcribe the image in the same section of the page. This notebook will help the user to annotate image crops with a placeholder to write the word in the image.
We will start importing the main libraries and set the path of the images we need to annotate. Ipywidgets will give us interactive control over the notebook.
Now we define the function that will display the images with a text box:
Once we run the previous cell we should be able to see the images with a text box beneath each image to write the label or transcription. We can replace the label which in our case is an empty string with pre-annotated labels and we can use this notebook to only fix the annotation.
Before we export the data here is a tip we highly recommend to be taken into consideration. Since it is not a perfect solution annotation should be in batches to avoid big losses if something went wrong with the notebook.
A famous approach for storing the data for training a recognition model is with the same structure we demonstrated.
You can see the notebook here: