Thursday, August 6, 2009

Activity 10 Preprocessing Text

In our past few activities we learned various ways of manipulating images both in the frequency domain and the morphologically. This time we will use all that we have learned to extract handwriting and other texts from a scanned document.


Figure 1. (Left) Original document and (Right) the schematic
on how the tilt angle was calculated.

Looking at the given document we first notice that it is slightly rotated so our first task is to straighten the document. We do this by using the 'rotate' argument of the mogrify() function in Scilab. This is a function that edits or distorts an image in numerous ways, in this case we do rotation. But before we do the rotation, we must first determine the angle of tilt of the image. We measure this angle by creating a horizontal line across the lines of the document and we calculate the angle by simple trigonometry (figure 1). And doing this method the angle we calculated was 1.05 degrees. The results of the rotation are shown on figure 2.


Figure 2. The result of rectifying the whole image and also a selected area.

We also note that the horizontal lines of the image are actually not needed and that it interferes with our objective. To remove these lines we must first invert our image (white over black) and convert it to gray-scale then apply a filter or mask in inverse space. This is similar to what we did in activity 7 and since we already know that the FT of horizontal lines would be along the vertical we simply create a mask that removes this portion of the images FT. After we apply the mask we then binarize the image to remove the background. The mask and resulting images are shown on figure 3, Fourier transform of the image is also included.


Figure 3. (top) The inverted image with its Fourier transform and (below) to the right is the mask used to filter the lines. To the left is the resulting binarized image after filtering.

After removing the lines we now go on to extracting the handwritings. We only cut a select few regions in the image where we will extract the handwritings. Our first step is to binarize the selected regions using a threshold just like in the past activities. Then we did some morphological operations hoping that it would result to a clear, readable, and distinct letter characters separated from each other. Then we label each character using bwlabel(). Unfortunately this task is very very difficult to accomplish. The handwriting through out the document overlaps with unwanted structures. And since the handwriting also has a lower gray value compared to the unwanted structures separating them becomes even more difficult. Further more if we are not careful the morphological operations destroy the letters instead of making them more distinct. Figure 4 shows the results after morphology, here I used the skel() function of Scilab to reduce the characters to single pixel width. The results, however, show that the final product does not actually look like letters. Still it is also true that these are actually similar to the original handwriting and to some degree each character was separated from each other.



Figure 4. (Left) The slected area of the document with handwriting and (Right) the result by extracting the handwriting using different morphological operations.

Our last task is to find instances of the word description with in the document. We did this by taking a template of the word description and correlating it with the image. The correlation is done similar to how we did it in activity 5. We simply multiply the FT of the template and the conjugate of the FT of the image and then take the FT of this product. If the correlation is correct it would result to an image that have the highest value at the instances of the word description. Figure 5 shows the template and the result of locating the instances of the word description. By comparing the image with the result of the correlation we see that as expected the correlation peaks at the locations of the word description. This result means that our method for locating a template works even if the quality of the image is not ideal.


Figure 5. The binarized image and mask used to locate the instances of the word description. The result of the correlation indicating the instances is also shown at the bottom.

I give myself a grade of 9 in this activity.

No comments:

Post a Comment